Transcript Document
Single Cycle Datapath Design CMSC411/Computer Architecture These slides and all associated material are © 2003 by J. Six and are available only for students enrolled in CMSC411. Life. . .it's all about balls into boxes . CMSC411 – Computer Architecture / © 2003 J. Six Use and Distribution Notice Possession of any of these files implies understanding and agreement to this policy. The slides are provided for the use of students enrolled in Jeff Six's Computer Architecture class (CMSC 411) at the University of Maryland Baltimore County. They are the creation of Mr. Six and he reserves all rights as to the slides. These slides are not to be modified or redistributed in any way. All of these slides may only be used by students for the purpose of reviewing the material covered in lecture. Any other use, including but not limited to, the modification of any slides or the sale of any slides or material, in whole or in part, is expressly prohibited. Most of the material in these slides, including the examples, is derived from Computer Organization and Design, Second Edition. Credit is hereby given to the authors of this textbook for much of the content. This content is used here for the purpose of presenting this material in CMSC 411, which uses this textbook. CMSC411 – Computer Architecture / © 2003 J. Six Moving to the Datapath We have now seen the basics of computer performance, instruction set architectures, and computer arithmetic. To work through the design of a microprocessor datapath and control unit (the heart and soul of a microprocessor), we will be designing an implementation of a subset of the MIPS instruction set… Memory Reference: load word (lw) and store word (sw) Arithmetic Logic: add, sub, and, or, and slt. Branching: branch equal (be) and jump (j) CMSC411 – Computer Architecture / © 2003 J. Six Clocking For simplicity, our implementation will assume an edge-triggered clock methodology – this means that any values stored in the machine are only updated on a clock edge. State elements (registers and memory) update their internal values on the clock edge – therefore the combinatorial logic will take its inputs from a set of state elements and store its outputs into a set of state elements. The inputs are values that were previously written (in a previous clock cycle) and the outputs are values that can be used in a later clock cycle. CMSC411 – Computer Architecture / © 2003 J. Six Combinatronics, State Elements, and the Clock The reliance on state elements shows that the combinatorial logic, the state elements, and the clock are closely related. In this example, all signals must propagate from state element 1 through the combinatorial logic into element 2 in the time of one clock cycle. The time necessary for the signals to reach state element 2 defines the minimal length of the clock cycle. S ta te e le m e n t 1 C lo c k c y c le C o m b i n a t io n a l lo g ic S ta te e le m e n t 2 CMSC411 – Computer Architecture / © 2003 J. Six Edge-Triggered Clocking and Feedback The edge-triggered clocking methodology we have chosen allows us to read the contents of a register, send the value through some combinatorial logic, and write the result back to that same register, all in the same clock cycle. State element Combinational logic CMSC411 – Computer Architecture / © 2003 J. Six Our Datapath, Version 1: A Single Cycle Implementation We will begin by building a simple implementation that uses a single, long, clock cycle for each instruction. Each instruction begins execution on one clock edge and completed execution on the next clock edge. This implementation is slower than a multicycle implementation because such an implementation would allow different instruction classes to take different number of clock cycles – each of which could be much shorter (such an implementation would be more realistic, but require more complex control – we will build this one later). CMSC411 – Computer Architecture / © 2003 J. Six Starting on Our Way We will first look at all of the major components that are required to execute each class of MIPS instruction. The first element we need is a place to store instructions. Here, we use an instruction memory (a state element) that takes an address and returns the instruction at that location. The address of the current instruction also needs a state element, the program counter (PC). Finally, we need an adder to increment the PC to the address of the next instruction (this can just be an ALU with its control wired to always add). CMSC411 – Computer Architecture / © 2003 J. Six The First Three Components So, here’s what we need… Instruction address PC Instruction Add Sum Instruction memory a. Instruction memory b. Program counter c. Adder CMSC411 – Computer Architecture / © 2003 J. Six The First Datapath Section To execute any instruction, we fetch the instruction from memory. We also need to increment the PC to the next instruction, 4 bytes later. This section of the datapath is straightforward… A dd 4 PC R ea d a d d re s s In s tru c tio n In s tru c tio n m e m o ry CMSC411 – Computer Architecture / © 2003 J. Six R-Type Instructions Now let’s look at R-type instructions – note that these read two registers, perform an ALU operation on the contents, and write the result to a register. The 32 registers are stored in a structure called a register file. This is a collection of registers in which any register can be read or written by specifying the number of the register in the file. This structure is said to contain the register state of the machine. In addition to the register file, we will need an ALU to actually perform the operation. CMSC411 – Computer Architecture / © 2003 J. Six The Register File The R-type instructions have three register operands – we need to read two registers and write to one. For each word we want to read, the register file needs an input to specify the register number to read. For each word we want to write, we need two inputs, the register to write to and the data to be written. The semantics are different for reads and writes… The register file always outputs the contents of whatever register numbers are on the Read register inputs. Writes are controlled by the write control signal, which must be asserted for a write to occur at the clock edge. CMSC411 – Computer Architecture / © 2003 J. Six R-Type Datapath Components So, here’s what we need. Note that the read data outputs & the write data input of the register file and the inputs to the ALU are 32 bits wide… 5 Register numbers 5 5 Data 3 Read register 1 Read register 2 Registers Write register Write data ALU control Read data 1 Data Zero ALU ALU result Read data 2 RegWrite a. Registers b. ALU CMSC411 – Computer Architecture / © 2003 J. Six The R-Type Datapath Design Here is the datapath for R-type instructions… 3 Read register 1 Instruction Read register 2 Registers Write register Write data ALU operation Read data 1 Zero ALU ALU result Read data 2 RegWrite CMSC411 – Computer Architecture / © 2003 J. Six Memory Access Instructions Now let’s look at memory access instructions, load and store. Recall these instructions have the form… lw $t1, offset($t2) / sw $t1, offset($t2) They compute a memory address by adding the base register ($t2) to the 16-bit signed offset field in the instruction. If the instruction is a store, the value must be read from the register file (in $t1). If it is a load, the value read from memory must be written into the register file (into $t1). We need a sign extension unit to take the 16-bit field from the instruction and use it as a 32-bit input into the ALU. CMSC411 – Computer Architecture / © 2003 J. Six Memory Access Instructions We also need a data memory – this will need read and write control signals, an address input, and an input for the data to be written. So, we need… MemWrite Address Write data Read data Data memory 16 Sign extend 32 MemRead a. Data memory unit b. Sign-extension unit CMSC411 – Computer Architecture / © 2003 J. Six The Memory Access Datapath So the datapath for memory access instructions looks like… 3 Read register 1 Instruction Read register 2 Registers Write register Write data ALU operation MemWrite Read data 1 Zero ALU ALU result Address Read data 2 Write data RegWrite 16 Sign extend Read data Data memory 32 MemRead CMSC411 – Computer Architecture / © 2003 J. Six The Branch Instruction The beq instruction has three operands, two registers that are compared for equality and a 16-bit offset that is added to the branch instruction address to compute the target. Here, we have to compute the target by adding that offset to the PC+4. There are two details we need to keep in mind… The base for the branch target is the address of the instruction following the branch (PC+4). Since we already compute this, we’re OK. The offset field is shifted left by two bits so that it is a word offset. We need a shift-left-two unit. CMSC411 – Computer Architecture / © 2003 J. Six Branch Determination In addition to this computation, the hardware must compare the two registers and determine the next PC. If they are equal, the branch is taken and PC = branch target address. If not, the branch is not taken and PC=PC+4. We can use our register file and ALU to make this happen (we will subtract the registers and look at the Zero output bit to see if they are equal). We will ignore that branch control logic for now (don’t worry, we’ll design it later). CMSC411 – Computer Architecture / © 2003 J. Six The Branch Datapath Here is the branch datapath… PC + 4 from instruction datapath Add Sum Branch target Shift left 2 Instruction 3 Read register 1 Read register 2 Registers Write register Write data Read data 1 ALU Zero Read data 2 RegWrite 16 ALU operation Sign extend 32 To branch control logic CMSC411 – Computer Architecture / © 2003 J. Six Combining These Datapaths Let’s build a common datapath from the portions we have constructed. Our datapath is single cycle – no resource can be used more than once per instruction. Anything that is will need to be duplicated. We need separate data and instruction memories. Many elements can be shared between two (or more) instruction classes – however, we need to allow multiple connections to the input of an element and have a control signal select which input is connected. We need multiplexors! CMSC411 – Computer Architecture / © 2003 J. Six Combining Two Datapaths Let’s combine the R-type and memory access instruction datapaths. There are two differences… The second input to the ALU is a register (R-type) or the sign-extended lower half of the instruction (memory). The value stored into a destination register comes from either the ALU (R-type) or from memory (for a load). So we need to put multiplexors at these two locations (we will once again ignore the control signals and generate them later). Combination of Two Datapaths CMSC411 – Computer Architecture / © 2003 J. Six Here is a datapath formed by combing the memory and R-type datapaths… CMSC411 – Computer Architecture / © 2003 J. Six Adding the Instruction Fetch We can add the instruction fetch datapath we started with with almost no effort… Add 4 PC Read address Instruction Instruction memory Registers Read register 1 Read Read data 1 register 2 Read Write data 2 register 3 MemWrite MemtoReg ALUSrc Write data RegWrite 16 ALU operation Sign 32 extend M u x Zero ALU ALU result Address Read data Data memory Write data MemRead M u x CMSC411 – Computer Architecture / © 2003 J. Six Our Complete Datapath Finally, we add in the branch datapath. This requires one additional multiplexor that is used to select the next PC. PCSrc M u x Add Add ALU result 4 Shift left 2 PC Read address Instruction Instruction memory Registers Read register 1 Read Read data 1 register 2 Write register Write data RegWrite 16 ALUSrc Read data 2 M u x 3 ALU operation Zero ALU ALU result MemtoReg Address Write data Sign extend MemWrite Read data Data memory 32 MemRead M u x CMSC411 – Computer Architecture / © 2003 J. Six ALU Control Signals Now that we have designed a complete (yet simple) datapath, we can design the control unit – this will take inputs and generate write signals for each state element and control signals for each multiplexor. The ALU control is somewhat different than the main control, so we will design it first. Recall out ALU has three control inputs… ALU Control Function ALU Control Function 000 AND 110 subtract 001 OR 111 010 add set on less than CMSC411 – Computer Architecture / © 2003 J. Six ALU Operations Depending on what class of instruction we are executing, we need the ALU to perform one of these five functions… Memory instructions require the ALU to compute the address by addition. R-type instructions require one of any of the five operations, depending on the value of the 6-bit funct field in the instruction. Branch equal instructions require subtraction. CMSC411 – Computer Architecture / © 2003 J. Six The ALU Control Unit We can generate the 3-bit ALU control signal by creating a small control unit that has the function field of the instruction and a new 2-bit control field called ALUOp. ALUOp indicates whether the operation to be performed should be an addition (ALUOp = 00) for memory instructions, subtraction (ALUOp = 01) for beq, or determined by the operation encoded in the function field (10). We will see how the main control unit generates ALUOp in a little while. The ALU control unit will generate the 3-bit signal that directly controls the ALU. CMSC411 – Computer Architecture / © 2003 J. Six Multiple Levels of Decoding Notice that we are using multiple levels of instruction decoding. The main control unit generates ALUOp, which is then used by the ALU control unit to generate control signals for the ALU. This is a common practice and can reduce the size of the main control unit. Several smaller control units may also be faster (sometimes much faster) than one big control unit that generates all of the control signals itself. CMSC411 – Computer Architecture / © 2003 J. Six ALU Control Signals Here is a mapping that shows instruction, its funct field, the ALUOp signal, the desired ALU operation, and the ALU control signals. Instruction funct ALUOp Desired Op ALU Control LW XXXXXX 00 add 010 SW XXXXXX 00 add 010 BEQ XXXXXX 01 subtract 110 R-type/add 100000 10 add 010 R-type/sub 100010 10 subtract 110 R-type/and 100100 10 and 000 R-type/or 100101 10 or 001 R-type/slt 101010 10 sll 111 CMSC411 – Computer Architecture / © 2003 J. Six Designing our ALU Control Unit So our ALU control unit needs to take in ALUOp and funct and produce the ALU control signals. Let’s look at a truth table for this… AluOp Funct ALU Control Signals Bit 1 Bit 0 F5 F4 F3 F2 F1 F0 0 0 X X X X X X 010 X 1 X X X X X X 110 1 X X X 0 0 0 0 010 1 X X X 0 0 1 0 110 1 X X X 0 1 0 0 000 1 X X X 0 1 0 1 001 1 X X X 1 0 1 0 111 CMSC411 – Computer Architecture / © 2003 J. Six ALU Control Unit Design Notice that we use don’t care cases as liberally as possible – this leads to a more optimized design that uses less gates and runs faster. Going from the truth table to actual logic is very mechanical so we’ll skip that. CMSC411 – Computer Architecture / © 2003 J. Six Main Control Unit Design Now that we have designed the ALU control unit, we can return to the main control unit. To understand how this unit needs to function and how to connect the various fields of the instruction to the datapath, let’s review the instruction classes… R-type 0 31-26 rs 25-21 rt 20-16 rd 15-11 shamt 10-6 Load/Store 35 or 43 31-26 rs 25-21 rt 20-16 address 15-0 Branch 4 31-26 rs 25-21 rt 20-16 address 15-0 funct 5-0 CMSC411 – Computer Architecture / © 2003 J. Six Instruction Formats We can make some observations… The opcode is always in bits 31-26. We will refer to this field as Op[5-0]. The two registers to be read are rs and rt, at positions 25-21 and 20-16. This is true for R-type, beq, and store instructions. The base register for load and store instructions is rs, at positions 25-21. The 16-bit offset for beq, load, and store instructions is always in positions 15-0. The destination register is in one of two places… For a load, it is in positions 20-16 (rt). For R-type, it is in position 15-11 (rd). We need a multiplexor for the destination register. CMSC411 – Computer Architecture / © 2003 J. Six Integrating into the Datapath We can integrate these observations and our ALU control unit into the datapath… PCSrc Add ALU Add result 4 Shift left 2 RegWrite Instruction [25– 21] PC Read address Instruction [31– 0] Instruction memory Instruction [20– 16] 1 M u Instruction [15– 11] x 0 RegDst Instruction [15– 0] Read register 1 Read register 2 Read data 1 MemWrite ALUSrc Read Write data 2 register Write Registers data 16 Sign extend 1 M u x 0 1 M u x 0 Zero ALU ALU result MemtoReg Address Write data 32 ALU control Instruction [5– 0] ALUOp Read data Data memory MemRead 1 M u x 0 CMSC411 – Computer Architecture / © 2003 J. Six Control Signal Review Looking at this design, we have eight control signals. We already known what ALUOp does so let’s review what the other control signals do… Signal Name Effect when deasserted Effect when asserted RegDst Destination register number for the write comes from the rt field. Destination register number for the write comes from the rd field. RegWrite NONE Register specified is written with the value on the Write data input. ALUSrc Second ALU operand comes from the second register file output. Second ALU operand is the sign-extended lower 16 bits of the instr. PCSrc PC = output of adder (PC+4) PC = output of adder (branch target) MemRead NONE Data memory contents at the specified address are put on the Read output. MemWrite NONE Data memory contents at the specified address are replaced with the Write input. MemToReg Value fed to Write input comes from the ALU. Value fed to Write input comes from the data memory. CMSC411 – Computer Architecture / © 2003 J. Six Generating the Control Signals So how are these signals set? Well, the control unit can set all but one based solely on the opcode field of the instruction. The PCSrc control signal is different… It should be set if the instruction is beq and the Zero output of the ALU is true. We can solve this by having the control unit generate a Branch signal and then ANDing that with the Zero output from the ALU to produce the PCSrc control signal.. CMSC411 – Computer Architecture / © 2003 J. Six Datapath With Control Unit We can add the control unit to our datapath… 0 M u x ALU Add result Add 4 Instruction [31 26] Control Instruction [25 21] PC Read address Instruction memory Instruction [15 11] Shift left 2 RegDst Branch MemRead MemtoReg ALUOp MemWrite ALUSrc RegWrite PCSrc Read register 1 Instruction [20 16] Instruction [31– 0] 1 0 M u x 1 Read data 1 Read register 2 Registers Read Write data 2 register 0 M u x 1 Write data Zero ALU ALU result Address Write data Instruction [15 0] 16 Instruction [5 0] Sign extend 32 ALU control Read data Data memory 1 M u x 0 CMSC411 – Computer Architecture / © 2003 J. Six Tracing Instructions To illustrate how our datapath works, let’s trace some instructions through. Beginning with a R-type instruction, we will highlight what components of the datapath are active at each of the four major steps involved in processing a Rtype instruction. Remember that although we are doing this in steps, this is a single clock cycle datapath – all of this happens on a signal clock cycle. CMSC411 – Computer Architecture / © 2003 J. Six R-Type: Step 1 Step 1 – Fetch instruction from memory and increment the PC. 0 M u x Add Add 1 Shift left 2 RegDst Branch 4 ALU result MemRead Instruction [31– 26] MemtoReg Control ALUOp MemWrite ALUSrc RegWrite Instruction [25– 21] PC Read register 1 Read address Instruction [20– 16] Instruction [31– 0] Instruction memory Instruction [15– 11] 0 M u x 1 Read data 1 Read register 2 Registers Read Write data 2 register Zero 0 M u x 1 Write data ALU ALU result Address Write data Instruction [15– 0] 16 Instruction [5– 0] Sign extend 32 ALU control Read data Data memory 1 M u x 0 CMSC411 – Computer Architecture / © 2003 J. Six R-Type: Step 2 Step 2 – Read source registers from the register file. 0 M u x Add Add 1 Shift left 2 RegDst Branch 4 ALU result MemRead MemtoReg Instruction [31– 26] Control ALUOp MemWrite ALUSrc RegW rite Instruction [25– 21] PC Read register 1 Read address Instruction [20– 16] Instruction [31– 0] Instruction memory Instruction [15– 11] 0 M u x 1 Read data 1 Read register 2 Registers Read Write data 2 register Zero 0 M u x 1 Write data ALU ALU result Address W rite data Instruction [15– 0] 16 Instruction [5– 0] Sign extend 32 ALU control Read data Data memory 1 M u x 0 CMSC411 – Computer Architecture / © 2003 J. Six R-Type: Step 3 Step 3 – ALU operating on the register data operands. 0 M u x Add 4 Instruction [31 26] Control Instruction [25 21] PC Read address Instruction memory Instruction [15 11] 1 Zero ALU ALU result Address Shift left 2 RegDst Branch MemRead MemtoReg ALUOp MemWrite ALUSrc RegWrite Read register 1 Instruction [20 16] Instruction [31– 0] ALU Add result 0 M u x 1 Read data 1 Read register 2 Registers Read Write data 2 register 0 M u x 1 Write data Write data Instruction [15 0] 16 Instruction [5 0] Sign extend 32 ALU control Read data Data memory 1 M u x 0 CMSC411 – Computer Architecture / © 2003 J. Six R-Type: Step 4 Step 4 – Writing the result back to a register. 0 M u x Add 4 Instruction [31 26] Control Instruction [25 21] PC Read address Instruction memory Instruction [15 11] 1 Zero ALU ALU result Address Shift left 2 RegDst Branch MemRead MemtoReg ALUOp MemWrite ALUSrc RegWrite Read register 1 Instruction [20 16] Instruction [31– 0] ALU Add result 0 M u x 1 Read data 1 Read register 2 Registers Read Write data 2 register 0 M u x 1 Write data Write data Instruction [15 0] 16 Instruction [5 0] Sign extend 32 ALU control Read data Data memory 1 M u x 0 CMSC411 – Computer Architecture / © 2003 J. Six Tracing Memory Instructions Looking at a load instruction such as lw $t1, offset($t2), we can see five steps… Instruction is fetched from memory and the PC is incremented. A register ($t2) is read from the register file. The ALU computes the sum of that value and the sign-extended, lower 16 bits of the instruction (offset). The sum from the ALU is used as the address for the data memory. The data from the memory is written into the register file – the destination is given by bits 20-16 of the instruction ($t1). CMSC411 – Computer Architecture / © 2003 J. Six Load Instruction: All Steps in One Showing the active components in the datapath for lw… 0 M u x Add 4 Instruction [31– 26] Control Instruction [25– 21] PC Read address Instruction memory Instruction [15– 11] 1 Zero ALU ALU result Address Shift left 2 RegDst Branch MemRead MemtoReg ALUOp MemWrite ALUSrc RegWrite Read register 1 Instruction [20– 16] Instruction [31– 0] ALU Add result 0 M u x 1 Read data 1 Read register 2 Registers Read Write data 2 register 0 M u x 1 Write data Write data Instruction [15– 0] 16 Instruction [5– 0] Sign extend 32 ALU control Read data Data memory 1 M u x 0 CMSC411 – Computer Architecture / © 2003 J. Six Tracing Branch Instructions Looking at a branch instruction such as beq $t1,$t2,offset, we can see five steps… Instruction is fetched from memory and the PC is incremented. Two registers ($t1&$t2) are read from the register file. The ALU sibtracts the data values. PC+4 is added to the sign-extended lower 16-bits of the instruction (offset) shifted left by two – the result is the target of the branch instruction. The Zero result from the ALU is then used to decide which result to store into the PC. CMSC411 – Computer Architecture / © 2003 J. Six Branch Instruction: All Steps in One Showing the active components in the datapath for beq… 0 M u x Add 4 Instruction [31– 26] Control Instruction [25– 21] PC Read address Instruction memory Instruction [15– 11] 1 Zero ALU ALU result Address Shift left 2 RegDst Branch MemRead MemtoReg ALUOp MemWrite ALUSrc RegWrite Read register 1 Instruction [20– 16] Instruction [31– 0] ALU Add result 0 M u x 1 Read data 1 Read register 2 Registers Read Write data 2 register 0 M u x 1 Write data Write data Instruction [15– 0] 16 Instruction [5– 0] Sign extend 32 ALU control Read data Data memory 1 M u x 0 CMSC411 – Computer Architecture / © 2003 J. Six Designing the Control Logic Now that we understand how the control unit interacts with the datapath, we can design that logic. This is simple combinatorial design, using the opcode (all six bits) as the input and producing each of the control signals as outputs. We can build the control function in a nice table format that shows the inputs and outputs for each type of instruction (R-type, lw, sw, and beq). Once this table has been derived, the implementation is trivial. So, let’s do it… CMSC411 – Computer Architecture / © 2003 J. Six The Control Function Input/Output? Inputs Outputs Signal Name R-Type lw sw beq Op5 0 1 1 0 Op4 0 0 0 0 Op3 0 0 1 0 Op2 0 0 0 1 Op1 0 1 1 0 Op0 0 1 1 0 RegDst 1 0 X X ALUSrc 0 1 1 0 MemToReg 0 1 X X RegWrite 1 1 0 0 MemRead 0 1 0 0 MemWrite 0 0 1 0 Branch 0 0 0 1 ALUOp1 1 0 0 0 ALUOp0 0 0 0 1 CMSC411 – Computer Architecture / © 2003 J. Six One More Instruction Type We have not added support for jump instructions to our datapath. Recall that a jump instruction looks a lot like a branch but is not conditional and computes the target PC differently… The low-order 2 bits are always zero (like beq). The next lower 26 bits come from the 26-bit immediate field in the instruction. The upper 4 bits come from the upper four bits of the PC+4. Recall the jump instruction looks like… 2 31-26 address 25-0 CMSC411 – Computer Architecture / © 2003 J. Six The New Datapath Here is the datapath with support for the jump instruction… Instruction [25– 0] 26 Shift left 2 Jump address [31– 0] 28 0 1 M u x M u x ALU Add result 1 0 Zero ALU ALU result Address Read data PC+4 [31– 28] Add 4 Instruction [31– 26] Instruction [25– 21] PC Read address Instruction memory Read register 1 Instruction [20– 16] Instruction [31– 0] Instruction [15– 11] Shift left 2 RegDst Jump Branch MemRead Control MemtoReg ALUOp MemWrite ALUSrc RegWrite 0 M u x 1 Read data 1 Read register 2 Registers Read Write data 2 register 0 M u x 1 Write data Write data Instruction [15– 0] 16 Instruction [5– 0] Sign extend 32 ALU control Data memory 1 M u x 0 CMSC411 – Computer Architecture / © 2003 J. Six Performance Problems with Single Cycle Datapaths We now have a complete single cycle datapath that will correctly implement the MIPS instruction set (well, the subset we have chosen to implement). Why is this approach not used? The clock cycle is determined by the longest path through the datapath. This should be a load instruction, as it uses five units. Several other instruction classes could run in a lot less time than this! So yes, everything runs in one clock cycle – but that cycle must be long enough for the longest instruction to complete! Moving to a Multicycle Datapath CMSC411 – Computer Architecture / © 2003 J. Six In addition to the problem with inefficient performance, the single cycle datapath suffers from the fact that each functional unit can only be used once in each clock cycle. This is why we needed two separate memory units (among other less-than-optimal design characteristics). So, while it is nice and simple, the single cycle datapath is not used. We can solve this problem by have a much shorter clock cycle and having each instruction use multiple cycles. That is a multicycle datapath and that’s next.