Description
1. Consider an unpipelined or single-stage processor design like the one discussed in
slide 4 of lecture 16. At the start of a cycle, a new instruction enters the processor and
is processed completely within a single cycle. It takes 2,000 ps to navigate all the
circuits in a cycle (including latch overheads). Therefore, for this design to work, the
cycle time has to be at least 2,000 pico seconds.
1. What is the clock speed of this processor? (5 points)
2. What is the CPI of this processor, assuming that every load/store instruction
finds its instruction/data in the instruction or data cache? (5 points)
3. What is the throughput of this processor (in billion instructions per second)? (10
points)
2. The processor in Q1 above is converted into a 10-stage pipeline. The slowest of these
10 stages takes 250 ps (including latch overheads).
1. What is the clock speed of this processor? (5 points)
2. What is the CPI of this processor, assuming that every load/store instruction
finds its instruction/data in the instruction or data cache, and there are no stalls
from data/control/structural hazards? (5 points)
3. What is the throughput of this processor (in billion instructions per second)? (10
points)
4. What is the speedup, relative to the unpipelined processor in Q1? Why is the
speedup less than 10X? (10 points)
3. Show how the following four consecutive instructions move through each stage of
the five stage pipeline, similar to the example on slide 11 of lecture 17. This pipeline
does not support any bypassing. Make sure the decode stage does not advance an
instruction through the pipeline unless all data dependences are correctly resolved.
(25 points)
I1: add $s1, $s2, $s3
I2: lw $s4, 4($s4)
I3: add $s5, $s4, $s1
I4: sw $s5, 8($s2)
4. Show how the same four instructions move through each stage of the five stage
pipeline, similar to the example on slide 13 of lecture 17. This pipeline does support
bypassing. Make sure the decode stage does not advance an instruction through the
pipeline unless all data dependences are correctly resolved. You don’t need to show
the latch involved in every bypass (but feel free to ponder this question for your own
understanding). (25 points)
I1: add $s1, $s2, $s3
I2: lw $s4, 4($s4)
I3: add $s5, $s4, $s1
I4: sw $s5, 8($s2)