Description
This assignment includes another excellent application for inheritance, where one parent class has several derived classes.
Problem Description
The previous two assignments provided a very good picture how how expressions are structured, and how they can be evaluated, but they do not at all represent how a real C program would tend to evaluate them. Although a compiler may build a tree structure to understand and optimize code, it would not try to store that data structure in a file to be interpreted at run time.
Instead, it would produce a series of instructions understood by the machine — what is often referred to ‘object code’. This assignment will mimic that translation into a machine language, and then simulate execution of that code. No knowledge of any particular machine language or assembly language will be required for this assignment.
The Emulated Machine
Instructions occupy memory; data occupies memory; so naturally our simulated machine will consist of memory. This will be easily represented with Python lists. The instruction list will hold instructions (a new abstract data type); the data list will hold all of the simulated variables.
In addition, most modern machines have a distinct set of storage locations directly within the Central Processing Unit, called “registers”. All instructions involving calculations will use the registers. There are separate instructions that would transfer values back and forth between these registers and the data memory.
For simplicity, and since many real compilers do this, we will first assume an arbitrarily large number of registers are available, and just pick a new one any time one is needed. Of course, a real machine really has a finite set of registers, but that issue is handled as a later step, and therefore not important to this assignment.
The instructions in many modern machines are very simple — just doing one operation. It might be easiest to jump to an example.
Sample Code Translation and Execution
Here is a small ‘program’ to use an example input to the assignment:
a = 3
b = a+2
(b+1)*(b+1)+a
And jumping to the end, here are the results of ‘running’ this code.
3
5
39
In fact, Homework 5 should have been able to do this already.
But different from Homework 5 is what happens in between: producing and executing some sort of machine code. Here is what my solution produces:
0 : T1 = 3
1 : stack[0] = T1
2 : print T1
3 : T2 = stack[0]
4 : T3 = 2
5 : T4 = (T2+T3)
6 : stack[1] = T4
7 : print T4
8 : T5 = stack[1]
9 : T6 = 1
10 : T7 = (T5+T6)
11 : T8 = stack[1]
12 : T9 = 1
13 : T10 = (T8+T9)
14 : T11 = (T7*T10)
15 : T12 = stack[0]
16 : T13 = T11+T12
17: print T13
The exact appearance of your intermediate code is not important, and this is not precisely how the instructions are represented in my data. The code to display my compiled instructions just displays them in a format that is easier to read.
This example uses 13 registers (T1 through T13), two variables (stored in positions 0 and 1 on the stack), and 18 instructions. Of these instructions, 4 place a constant into a register, 4 load a variable into it, 2 store into a variable, and 5 compute. Those are representative of what real machines do.
Real machines, however, do not actually have a ‘print’ instruction. That is a convenience for this assignment. It may be noted that there is no equivalent in the input file either. It will be understood implicitly that the result of each expression is to be displayed.
Typical machines also have two very important special registers — the program counter and the stack pointer. The program counter is nothing more than an indicator of what instruction to do next. For the simple example above, all it did was count from 0 to 17. The stack pointer assists in finding variables in the data memory, and will be discussed more later in this assignment file.
Representing Instructions
The program is represented as an list of instructions, but there several different kinds of instructions available. Therefore, inheritance would be highly relevant in this problem.
There are very few methods required for any of these classes. All that is needed is a way to construct an instruction (__init__, a way to display it (much like __str__ for ExprTree), and a way to execute it. Furthermore, execution should be very simple, since all the necessary information is encoded in the instruction itself. (I expect many one-line methods).
In addition, it should be noted that all of the computational instructions look very similar. These can also be accomplished as one-line methods with the help of Python’s eval function.
Here is a portion from the instructor’s machine.py:
from abc import ABCMeta, abstractmethod
class Instruction(metaclass=ABCMeta):
“””Simple instructions representative of a RISC machine
These instructions are mostly immutable — once constructed,
they will not be changed — only displayed and executed
“””
def __init__(self, t): # default constructor
self._temp = t # every instruction has a register
def get_temp(self): # which holds its answer
return self._temp
# All derived classes must implement these two methods:
@abstractmethod
def __str__(self):
“””Describe instruction as a readable statement”””
return “”
@abstractmethod
def execute(self,temps,stack):
“””Execute instruction, given temporary registers and stack space”””
pass
class Print(Instruction):
“””A simple non-RISC output function to display a value”””
def __str__(self):
return “print T” + str(self._temp)
def execute(self,temps,stack,pc,sp):
print( temps[self._temp] )
The parameters to execute are the two lists, representing temporary registers and stack memory for variables, and two special registers that are only relevant to Extra Credit
You will need to define some additional operations:
Required Operations Example
Initialize a temporary register to a value T2 = 3
Load a temporary register from a variable T3 = stack[0]
Store a temporary register into a variable stack[1] = T4
Perform a computation using temporary registers T3 = T1*T2
Extra Credit Operations
Unconditional Branch Go to Instruction 15
Conditional Branch Go to Instruction 15 if T3 is False
Function Call Call function at Instruction 5
Function Return Return
A provided front-end
The given compile.py has the following features:
— It defines a program object to represent a compiled program
— It calls your existing functions to build expression trees, and calls new ones to compile those trees into machine code.
— it inserts Print statements after each complete expression
— It displays code, sets aside some simulated memory for it, and executes
— it defines a function that takes a file name, then compiles and runs the program in that file
Here is a typical sequence of events:
for each line of input (hard-coded or from file)
compile that line into machine code
if it is a function definition,
record that information for later
otherwise
insert a print statement displaying the result
display all of the compiled machine instructions
allocate memory for registers and stack variables (using Python lists)
initialize program counter and stack pointer
execute the machine instructions until the bottom is reached
The compile_tree function is very much the same as the evaluate function in the previous assignments, renamed to emphasize the no actual evaluation occurs. Instead, the the appropriate machine instructions are generated and stored into the program memory.
Since it is expected that each derived class already has a straightforward way of performing each operation, it should be straightforward to adapt them to generate the required machine instructions, and in the proper sequence.
The given execution loop automatically advances the program counter for every instruction (like any machine would). This avoids the need to do it separately for each instruction type.
Operations to Support
For Full Credit:
All of expressions covered by the Required Operations above.
The rest of the following will be worth varying amounts of Extra Credit:
The conditional expression
This would need to be able to jump over the code that is not to be executed, which would require instructions to change the program counter. Two new instructions would suffice — one to change the program counter if and only if the boolean expression is false; and one to change it unconditionally (to get to the next expression). In both cases, what to change it to will not be known until a bit later.
Of course, this would also require support for the relational operators, but that should already be supported above.
Non-Recursive Function Definitions and Function Calls
This involves assigning values to the parameters, changing the program counter to refer to the function, and then, after the function ends, changing the program counter back to where it was. Also, it must make sure two active functions do not try to use the same memory.
Everyone should recall that when a ‘real’ program runs, memory is set aside for each function’s variables, and then returned on function exit. That can be managed by the given stack pointer variable. This value (which is just an array subscript here) is subtracted by the memory needs of the function at the start, and then restored by addition at the end.
The convention of starting at the back of the memory array and subtracting instead of starting and the front and adding is simply a matter of practicality. If we started at 0, then the value to add or subtract would be the memory needs of the calling function, which are completely unknown to the called function, which could be called from anywhere at any time. This subtraction approach allows us to work with known values.
To be able to return to the calling function at the end, one must remember where to go back to. And if there are further function calls, several such values would need to be remembered. Fortunately, we already have this simulated stack memory, so it would be wise to make use of it.
A lot of stuff clearly happens on function call and return. It is permissible to make some very complicated Call and Return instructions (unlike what a real RISC machine would do) These would affect both the stack pointer and the program counter It might be simple in some cases to actually use the existing register to/from memory instructions to assist handle the function arguments and return value.
Full credit for this extra credit option must support multiple arguments, and not necessarily exactly one.
Recursive Functions
To avoid infinite recursion, these functions would necessarily use the conditional expression within the function body. Ideally that would mean this would follow naturally from both of the preceding Extra Credit options.
But it is very possible to create an implementation that does not support recursion (which was true for many 20th-century compilers), so this is just making sure the two features work together properly.
Assignment Specifications
The list of files is also essentially the same as for the previous assignment, with some additions to several of them.
compile.py
Front end to compile and run program lines or program files
No changes required, unless pursuing the extra credit, which will want to update the program counter or stack pointer. This is most easily done by defining a return value for execute
machine.py
Definitions of machine instructions and how to execute them
exprtree.py
Defines an expression tree, with added support for generating machine code
vartree.py
Holds definitions for variables and functions, with added support for their memory addresses
infixtotree.py
newsplit.py
peekable.py
code to convert input into tree form (no new code required)
Additional Extra Credit: Common Subexpressions
A quick glance at the sample code above would indicate that it is doing a bit more work than it needs to do. It is computing the value of “b+1” twice. It seems it should only need to compute that value once, and then reuse its value.
A good optimizing compiler would do its best to eliminate such redundant work. Here is a sketch of how it could do that.
First: Every time a variable is assigned to, remember where it got its value from. After all two “b+1″‘s would not be equal if the value of b could change between them.
In this case, the only assignment to variable b is the value of T4.
Once that is recorded, that fact can be used for all subsequent uses of that variable, before it is changed again. For this sample program, we would not be calling for new registers in instructions 8 and 11, but would simply use T4 again.
Similarly, before defining a new register to hold a constant value we can check to see if that value has already been encountered. That would mean we would not have to store 1 into two of them.
That much would start generating code that looks like this:
0: T1 = 3
1: stack[0] = T1
2: print T1
3: T2 = 2
4: T3 = (T1+T2)
5: stack[1] = T3
6: print T3
7: T4 = 1
8: T5 = T3+T4
We have now gotten as far as evaluating the first “b+1”, and now things get interesting. When translating the second “b+1”, we see a need for b (already in T3) and a 1 (already in T4). So computing “b+1” means adding T3 + T4, but we already did that too! The second operand to the multiplication can just reuse T5, and we will have eliminated three instructions all at once.
Furthermore, if one was willing to commit all variables to registers, instructions #1 and #5 could also be eliminated.
That is the optimization technique in a nutshell, but to be truly good, it must be reasonably efficient. It would not be wise to scan through all the preceding instructions to look for a match. Doing so would make compilation time proportional to the square of the size of the program!
Instead a good implementation must include a fast (e.g. O(1) time) way of identifying that the sum T3+T4 was already computed (even if it was not the very last operation). The same approach would also be helpful in spotting duplicated constant values.