Description
This assignment will focus on the abstraction of iterators, using them to parse the input for the assignment. Part of the assignment will implement an iterator to examine the input data, and another will make use of that iterator to parse the given expression in producing an outcome.
Problem Description
The previous assignment assumes its input includes spaces between every token in an expression, but many programmers tend not to use so many spaces, if the expression is unambiguous. For example, both of the following expressions would be considered to be equivalent:
1 + 2 * 3 1 + 2*3
If the expression on the right was provided as input to the first homework assignment, the lack of spaces would cause the “2*3” to be interpreted as a single token, so it would not perform any multiplication, and might not make any sense of the token at all.
Of course, it is a simple concept to imagine adding functionality to take a full character string, and then to create a new version that has all the spaces included. However, that requires both the time to create that new version, but also the memory to hold it. Any programmer intending for maximal efficiency and minimal time should like to eliminate that extra step.
In fact, the previous homework itself had an intermediate step creating a new variable — it took a single character string, and created a list of shorter character strings, and then used a subscript variable to visit each element of that list.
This goal of this assignment is to eliminate all of those middle steps and to interpret the numeric expression directly from the original character string, without taking any time allocating new variables.
Identifying Tokens
The first phase of this project is to take the input string and divide it up into tokens (individual numbers and operators). The goal here is to let a caller ask for each token in turn and to supply it on demand.
The simplest approach is one presented in a recitation assignment that only requires implementing a single function — a generator that yields one token at a time as it finds it in the input string.
Here is an intentionally incomplete illustration from the instructor’s solution, as it stands at this time of writing:
(in newsplit.py)
def new_split_iter( expr )
“””divide a character string into individual tokens, which need not be separated by spaces (but can be!)
also, the results are returned in a manner similar to iterator instead of a new data structure
“””
expr = expr + “;” # append new symbol to mark end of data, for simplicity
pos = 0 # begin at first character position in the list
while expr[pos] != “;” # repeat until the end of the input is found
—- to be filled in by the student, using yield as necessary
yield “;” # inform client of end of data
You may identify whether an individual character is a numeric digit via expr[pos].isdigit() There are similar methods isalpha for letters and isalnum for alphanumerics (letters or digits).
Where Python iterators fall short
To qualify as an iterator in Python, it is sufficient to support two operations:
— get started (usually through a call to iter())
— obtain the next element, (through an explicit or implicit call for next())
Unfortunately, this is insufficient for the parser in this assignment. Consider the example “1 + 2 * 3”. The function that parses products will use the ‘next’ operation to obtain the plus sign, and determine that it is not a multiplication operator, returning control to the function that parses sums. But if that function were to ask for the ‘next’ symbol, it would move forward to the 2 and not see the plus sign. One could consider having every function pass its last scanned symbol to its caller, but that would likely make the interface look very clumsy.
The better solution, in the instructor’s opinion, is to extend the definition of the iterator to allow it to examine data without moving forward. In fact, other programmers believe the same thing — there is a proposed package online called ‘itertools’ that wishes to add a great deal more functionality to the default language’s iteration. In fact, the instructor’s solution was based partly on seeing the proposed interface to the more powerful iterator (but without seeing any of the implementation).
The file peekable.py is provided free for this course and for student use within the course. It is expected to never be modified by the students or submitted in solutions. This example follows the second model from the recitation for an iterator, but can its functionality to any other variety of iterator.
Assignment Specifications
The student submission will is to consist of two files:
— A completed newsplit.py based on what was given above, defining a function new_split_iter that accepts a character string, and continually uses the yield statement to return the individual tokens as character strings.
— A file named infix2.py that very much resembles the infix1.py from Homework 1. The primary difference is that the function parameters will no longer consist of a list and an integer subscript, but will instead receive all data through an iterator.
Here appear a very few select lines from the instructor’s solution, as it appears at the time of this writing:
# import peek functionality for iterators
# and maybe the splitter, if you need it
from peekable import Peekable, peek
from newsplit import new_split_iter
def eval_infix_sum(iterator)
“””evaluate sum expression (zero or more additions or subtractions), using an iterator”””
….. code that no longer uses array subscripting
def eval_infix_iter(iterator)
“””evaluate an expression, given an iterator to its tokens”””
return eval_infix_sum(Peekable(iterator))
def eval_infix(expr)
“””accept a character string, split it into tokens, then evaluate”””
return eval_infix_iter(new_split_iter(expr))
Specification Details
You may assume:
Correctly formed expressions consisting of integers and operators
All digits within an integer are adjacent (“12” is valid, but “1 2” is not)
All operators are single characters (accepting “//” is not required)
There will be no division by zero operation in given expressions
You may not assume:
anything about how many spaces appear between tokens (may be 0, 1, or more)
anything about how many space characters appear at beginning and end of input
anything about how many tokens may appear within a given input string
any assumptions disallowed in the previous assignment about expressions
Unit Testing
Another feature of the Python language that has no equivalent in C++ is the ability to embed test code within each file, which may be used to test the functionality of that file. It includes a conditional test that determines whether that file is being run by itself (for testing) or is being used in a larger project.
Here is a portion of the instructor’s infix2.py, following the code shown above:
if __name__ == “__main__”:
print ( eval_infix(“5 “) )
print ( eval_infix(“15 “) )
print ( eval_infix( ” 2 * 3 + 1 ” ) )
print ( eval_infix( ” 2 + 3 * 1″ ) )
If the Python environment is told to run infix2.py directly, it will attempt these print statements. On the other hand, the Python environment is told to run some other file, this code is skipped.
Along the same lines, here is a test statement in the instructor’s newsplit.py testing the iterator’s results:
print (list( new_split_iter( “3+4 * 5” )))
Grading and Evaluation
The TA’s will be testing your submission using a separate file that consists of these lines:
from infix2 import eval_infix_iter
from newsplit import new_split_iter
def test(expr):
print (expr, ‘=’, eval_infix_iter(new_split_iter(expr)))
test(“15”)
There will be of course several test cases not shown here, but this will show how everything interacts. A character string expr is provided to the function new_split_iter that returns an iterator to be used by the parsing function from Homework 1, as adapted for an iterator. The print function simply displays the input string and the calculated result.
Planning Ahead
Later in this course, we will use this same program file to recognize variable names and relational operators.
A few relational operators have two characters: {“<=”, “>=”, “!=”, “==”}. That does violate the assumption I said you could make earlier for this assignment, but if you can support them now, you won’t have to do it later. A handy note here is that the second character in all of these is always ‘=’.
Extra Credit Option
Both of these first two homework assignments recommended an action that some purists might dislike. They added an extra symbol to the end of the input to guarantee that the parser would never run out of data to examine. Can you implement this project not to involve adding that extra character?
Hint: One approach that would be very consistent with how Python addresses similar situations is its exception handling mechanism. You may investigate the text and online resources to see how an iterator may raise an exception when it runs out of data (and figure out for yourself how your program can identify that happened). The parser functions can then have the exception signal there is no more data (such as no more operators).
For this assignment, it is sufficient to still assume that all input is still completely valid, with well-formed and complete expressions. This feature is simply an alternative (and some might say cleaner) way of stopping at the end of the input string.