Description
1 CPU Datapath
The following figure shows the overall datapath of the simple 5-stage CPU we have learned.
MUX1
MUX2
MUX3
MUX4
There are four multiplexers (MUX) in the figure, which are labeled and numbered. Please answer the
following questions regarding these multiplexers. (30 points)
1. Please give the two inputs of each multiplexer.
(a) MUX1:
i. input 1:
ii. input 2:
(b) MUX2:
1
i. input 1:
ii. input 2:
(c) MUX3:
i. input 1:
ii. input 2:
(d) MUX4:
i. input 1:
ii. input 2:
2. Given a memory load instruction, “mov R0, [R1+ 1000],” please give the input that should be selected
at each multiplexer. You can write “none” for the multiplexers that are not used for this instruction.
(a) MUX1:
(b) MUX2:
(c) MUX3:
(d) MUX4:
3. Given a memory store instruction, “mov [R1+1000], R0,” please give the input that should be selected
at each multiplexer. You can write “none” for the multiplexers that are not used for this instruction.
(a) MUX1:
(b) MUX2:
(c) MUX3:
(d) MUX4:
4. Given a bitwise XOR instruction, “xor R0, R1, R2,” please give the input that should be selected at
each multiplexer.
(a) MUX1:
(b) MUX2:
(c) MUX3:
(d) MUX4:
5. Given a conditional branch instruction, “jnz 100,” please give the input that should be selected at
each multiplexer. Assume the branch is not taken. You can write “none” for the multiplexers that are
not used for this instruction.
(a) MUX1:
(b) MUX2:
(c) MUX3:
(d) MUX4:
6. Given an unconditional branch instruction, “jmp 100,” please give the input that should be selected at
each multiplexer. Unconditional branches are always taken. You can write “none” for the multiplexers
that are not used for this instruction.
(a) MUX1:
(b) MUX2:
(c) MUX3:
(d) MUX4:
2
2 Pipelining
2.1 Problem 1 (adapted from Comp Org & Design Exercise 6.2):
A computer architect needs to design the pipeline of a new microprocessor. She has an example workload
program core with 107
instructions. Each instruction takes 100 ps (picosecond) to finish. (20 points)
1. How long does it take to execute this program core on a non-pipelined processor? (5 points)
2. A simple RISC microprocessor has 5 pipeline stages. Assume it is perfectly pipelined. How much
speedup will it achieve compared to the nonpipelined processor? For this question, you need to compute
the execution time for a stage (cycle), when the execution of one instruction is partitioned into 5 stages.
Also given a perfect pipeline, every cycle there is a new instruction finishes execution. (15 points)
3. The current state-of-the-art microprocessor has 17 pipeline stages. Assume it is perfectly pipelined.
How much speedup will it achieve compared to the nonpipelined processor? For this question, you
need to compute the execution time for a stage, when the execution of one instruction is partitioned
into 17 stages. (10 points)
4. Real pipelining isn’t perfect, since implementing pipelining introduces some overhead (extra time) per
pipeline stage. Will this overhead affect instruction latency (i.e., exec time per instruction) , instruction
throughput (i.e., instructions per second), or both? (10 points)
3
2.2 Problem 2:
Consider the code segment below. Assume that full bypassing/forwarding has been implemented. Assume
that the initial value of register R23 is much bigger than the initial value of register R20. Assume that all
memory references take a single cycle. Assume X, Y and Z are constants. You may not reorder instructions
to fill the slots, but if a subsequent instruction is independent and is properly positioned, you may assume
that it fills the slot. Otherwise, fill slots with additional stalls/bubbles as needed. (30 points)
LOOP: mov R10 , [ R20+X]
mov R11 , [ R20+Y]
sub R11 , R10 , R11
mov [ R20+Z ] , R11
add R20 , R20 , 4
sub R5 , R23 , R20
j n z R5 , LOOP
1. Draw a pipeline diagram in the following table of its execution on a standard 5-stage RISC pipeline.
In the box below, write the total number of cycles required to complete 2 iterations of the loop.
Cycles:
Pipeline executions:
Instruction
Cycle/Time →
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
mov R10, [R20+X] IF ID E M W
4
3 Branch Prediction
Please fill the following table with branch predictor states and branch outcome predictions using the 2-bit
saturated counter (20 points).
State Prediction Actual Branch Outcome
00 Taken
Taken
Taken
Taken
Taken
Taken
Not Taken
Not Taken
Taken
Not Taken
Taken
Taken
Not Taken
Taken
5