Description
In this project, you will enhance your simulator from Project 3 to model a pipeline with full
forwarding paths. There is no additional code provided for this assignment. You should modify the
CPU and Stats classes from your Project 3 submission.
Exactly as in Project 3: The pipeline has the following stages: IF1, IF2, ID, EXE1, EXE2, MEM1,
MEM2, WB. Branches are resolved in the ID stage. There is no branch delay (in other words, the
instruction words immediately following a taken branch in memory should not be executed). There
is also no load delay slot (an instruction that reads a register written by an immediately-preceding
load should receive the loaded value). There are no structural hazards, and data is written to the
register file in WB in the first half of the clock cycle and can be read in ID in the second half of that
same clock cycle.
This time, assume full data forwarding is possible and therefore bubbles are only required for data
hazards that cannot be resolved by the addition of a forwarding path. In the case of such a data
hazard, the processor stalls the instruction with the read-after-write (RAW) hazard in the ID stage
by inserting bubbles for the minimum number of cycles until a forwarding path can get source data
to the instruction.
To do this, ID must track the destination registers and cycles-until-available information for
instructions later in the pipeline, so that it can detect hazards and insert the correct number of
bubbles. (ID also uses that information to create forwarding logic control signals that are then
flopped down the pipeline with the consuming instruction, though you do not have to model that
part for this project). This is called static scoreboarding, and it’s the technique used by one of the
processors we’ve studied in class, the ARM Cortex-A8.
All instruction inputs are needed at the beginning of the respective stage. Most instructions need
their inputs in the EXE1 stage, except for jr, beq, and bne, which need their inputs in ID, and the
sw instruction’s store data, which is needed in MEM1 (the base register is still needed in EXE1).
Instruction results become available for forwarding at the beginning of the stage after they are
produced (e.g., the ALU produces data at the end of EXE2, but that data is not forwardable until the
beginning of MEM1, with the data forwarded from the EXE2/MEM1 flop). All instruction results are
produced at the end of the EXE2 stage, except for lw, mult, and div, which produce results at the
end of MEM2, and jal, whose result becomes available at the end of ID.
For simplicity, assume that trap instructions follow the same timing as add instructions. As
before, trap 0x01 reads register Rs and trap 0x05 writes register Rt. Note that mfhi and
mflo read the hi/lo registers, and mult and div write them.
2
The $zero register cannot be written and is therefore always immediately available.
Your simulator will report the following statistics at the end of the program:
• The exact number of clock cycles it would take to execute the program on a CPU with the
hardware parameters described above. (Remember that there’s a 7-cycle startup penalty
before the first instruction is complete)
• The CPI (cycle count / instruction count)
• The number of bubble cycles injected due to data dependencies
• The number of flush cycles in the shadows of jumps and taken branches
• The total number of RAW (read-after-write) hazards detected, including those that
result in bubbles and those that can be immediately addressed by forwarding. Note that an
op in ID cannot have a RAW hazard with an op in WB.
• The ratio of instructions to RAW hazards
• The number and percentage of RAW hazards identified on the instruction in each of the
stages between ID and WB
ASSIGNMENT SPECIFICS
Begin by copying all of your Project 3 files into a new Project 4 directory, e.g.:
$ cp -r cs3339_project3/ cs3339_project4/
You can modify any of the project files in any way that you’d like, though only the CPU and Stats
classes should have to be changed from Project 3. You will probably want to change the parameter
lists of some Stats member functions, and you will probably want to add some member variables to
Stats.
I recommend the following approach:
Modify your registerSrc and registerDest functions to take a second argument,
specifying the earliest pipeline stage in which the source will be needed or the result will become
available, respectively. In addition to tracking the destination register of each op in the pipeline,
also track the stage in which that instruction’s result will become available for forwarding.
Whenever a register is used as a source, the Stats class should look for instructions in later
pipeline stages (older instructions) that will write result data to that register in a future cycle, i.e.
RAW hazards. The Stats class should then determine, based on what stage the source is needed and
when the matching destination will be produced, whether the data dependency can be handled by a
forwarding path without a bubble, or whether one or more bubbles must be injected.
Note that it’s possible for multiple instructions later in the pipeline to all be writing the same
destination register, and that if the instruction in ID reads that register, the hazard exists only on
the most recent (youngest) producing instruction.
You can again check your timing results using the equation instrs = cycles – 7 – bubbles – flushes.
Note that your flush count should not change from Project 3, but that you should see significant
reduction in bubble count (and, consequently, significant improvement in CPI) due to the addition
of forwarding paths.
The following is the expected result for sssp.mips. Your output must match this format verbatim:
3
CS 3339 MIPS Simulator
Running: sssp.mips
7 1
Program finished at pc = 0x400440 (449513 instructions executed)
Cycles: 983220
CPI: 2.2
Bubbles: 481710
Flushes: 51990
RAW hazards: 349295 (1 per every 1.29 instructions)
On EXE1 op: 225878 (65%)
On EXE2 op: 79203 (23%)
On MEM1 op: 37697 (11%)
On MEM2 op: 6517 (2%)
The file project4_solution.txt on TRACS has expected bubble/cycle counts for all inputs.
Additional Requirements:
• Your code must compile with the given Makefile and run on zeus.cs.txstate.edu
• Your code must be well-commented, sufficient to prove you understand its operation
• Make sure your code doesn’t produce unwanted output such as debugging messages. (You
can accomplish this by using the D(x) macro defined in Debug.h)
• Make sure your code’s runtime is not excessive
• Make sure your code is correctly indented and uses a consistent coding style
• Clean up your code before submitting: i.e., make sure there are no unused variables,
unreachable code, etc.
SUBMISSION INSTRUCTIONS
Submit all of the code necessary to compile your simulator (all of the .cpp and .h files as well as
the Makefile) as a compressed tarball. You can do this using the following Linux command:
$ tar czvf project4.tgz *.cpp *.h Makefile
Do not submit the executables (*.mips files). Any special instructions or comments to the grader,
including notes about features you know do not work, should be included in a separate text file (not
inside the tarball) named README.txt.
All project files are to be submitted using TRACS. Please follow the submission instructions here:
http://tracsfacts.its.txstate.edu/trainingvideos/submitassignment/submitassignment.htm
Note that files are only submitted if TRACS indicates a successful submission.
You may submit your file(s) as many times as you’d like before the deadline. Only the last
submission will be graded. TRACS will not allow submission after the deadline, so I strongly
recommend that you don’t come down to the final seconds of the assignment window. Late
assignments will not be accepted.