Description

5/5 - (4 votes)

Introduction The objective of this assignment is to write a backtracking-free LL Predictive Parser. This parser will make use of output from the Lexer constructed in the previous project to obtain the token stream of the program being compiled. (Note, this also allows hand-built lexer-format files to be input by the Parser.) The Simple LL Predictive Parser consists of three parts: the simple parser machine, the LL parse table for the grammar involved, and the prediction stack. The simple version of the parser machine is a very short piece of code because it is table-driven. During parsing, the LL Parser creates a Parse Tree (PST) of Node objects. Each Node contains (wraps) the symbol (terminal or non-terminal) it was built for. Each node containing a symbol is built as the RHS of a selected rule is pushed onto the prediction stack. Once the PST is complete, the Parser outputs the tree as text in a simple serialized tree format. Finally, the Parser converts the PST to an AST. This AST is then output using the same serialized tree format. Team The team may contain from two to four members. Pick a three-letter name for the team. (If two teams pick the same TLA, I will disambiguate them.) Optionally, for teams, establish tasks and each day briefly discuss the Agile 3Qs: 1) What was completed yesterday?, 2) What is planned to complete today?, and 3) What obstacles are in the way?. Bread up a task so that some part can be completed each day. If you expect to have down days (i.e., no-work days), tell your team up front. LL Parse table The LL parse table is straight-forward to build (by hand) once the grammar has been converted to simple LL-compatible form. This involves several preparatory steps with the grammar: 1. Convert the grammar rules to simple rules because only simple rules go in table cells 2. LRE (Left Recursion Elimination), to avoid LL infinite recursion 3. Disappearing Non-Terminal analysis to add direct epsilon rules for ghostable Non-Ts 4. First Set construction, to see which simple rules go into which LL table cells 5. Fill the LL Table cells 6. If there are cell conflicts, use Left Factoring to avoid more than one simple rule in a cell 7. Follow Set construction, to see which epsilon rules go into which LL table cells 8. ID which RHS symbol will “own” the rest in the AST (or leave the mom Non-T as the owner) This will typically leave a number of cells unfilled. If the LL Parser machine looks up one of these empty cells, the Parser (step M3E) emits an error and stops. The error should indicate the token (i.e., column header) and the top-of-stack symbol (i.e., the rule LHS) and the token’s line number (if known). Issues  Match a prediction stack node’s symbol against an input stream’s token; via token kind.  Build a node for a symbol; several nodes can be built for the same symbol.  Mom node should know the rule used to build its kids; and have easy access to them for treewalking  Table cell should have access to its rule  Rule should have easy access to its RHS symbols  Leaf node should know the token it was built for; for 1) variable name and 2) error line # info  Should be easy to tell if a node’s symbol is a terminal or a non-terminal  Need an “input stream” easy to use (e.g., read each text-token into an array of Token refs) C. Siska October 11, 2018 Page 1 of 3 444 Compilers — Project #2 — Parser  How to do Rule #2 Add-a-Trick for a parser, because of the massive bug hunt potential for all of this. Tree Serialization Tree Serialization is done via a treewalk in pre-order (mom before kids). Each node in a tree is serialized as follows (with a space between each part). 1. Output a ‘(‘ parenthesis, and “Node:” 2. Output the node’s local field information (see below for this) 3. Output each child node, recursively, if any 4. Output a ‘)’ parenthesis – do this using a post-order lollypop An object’s local information should look like the following: 1. Output a ‘(‘ parenthesis 2. Output the object’s class or data type name (e.g., “Rule:” ) 3. Output the object’s ID number Note: Each object should get a unique ID number on creation (eg, counter, or memory location) For a given object type, these ID numbers don’t have to be sequential, just unique. 4. For each local field, output a tuple of the field name, “=”, the field value, and “;” If the value is a pointer (or reference), then output that object’s class name and unique ID 5. Output a ‘)’ parenthesis Indentation, line wrapping, etc. are all optional. The easiest solution is to output a newline before every ‘(‘ parenthesis. A6 Grammar For this project we will use the A6 grammar, specified in the Appendix. Deliverables 1. In a pdf document, show any new epsilon rules; show each LHS rules before and after for LRE; show before and after for any Left Factored rules, show the First Set for each rule, show the Follow Set for each LHS non-terminal symbol, and for each non-trivial rule show the RHS owner node to be hoisted (or indicate the LHS will be kept in place because no RHS seemed appropriate). 2. A6-table: The A6 LL Parser table as a spreadsheet or an importable text CSV file. 3. LL-Parser: The LL Parser machine’s source code, which you used to test the parser table on the example A6 programs from the Lexer project. 4. Readme: the README.txt file (e.g., title, contact info, files list, installation/run info, bugs remaining, features added, citations of work referred to). 5. Five output runs of your parser, one for each of the 3 Lexer’s sample programs, and two of your own choosing. Two outputs for each: one for the PST and one for the AST. Note, if you like, you can use hand-built input to your Parser, or you can use someone else’s Lexer output as input (but cite it). Academic Rules Correctly and properly attribute all third party material and references, if you used any. Submission & Grading Same as for project #1. C. Siska October 11, 2018 Page 2 of 3 444 Compilers — Project #2 — Parser Appendix — A6 Grammar Pgm = kwdprog Vargroup Fcndefs Main Main = kwdmain BBlock BBlock = brace1 Vargroup Stmts brace2 Vargroup = kwdvars PPvarlist | eps PPvarlist = parens1 Varlist parens2 Varlist = Varitem semi Varlist | eps Varitem = Vardecl | Vardecl equal Varinit Varitem = Classdecl | Classdef Vardecl = Simplekind Varspec Simplekind = Basekind | Classid Basekind = int | float | string Classid = id Varspec = Varid | Arrspec | Deref_id Varid = id Arrspec = Varid KKint KKint = bracket1 int bracket2 Deref_id = Deref id Deref = aster Varinit = Expr | BBexprs BBexprs = brace1 Exprlist brace2 | brace1 brace2 Exprlist = Expr Moreexprs Moreexprs = comma Exprlist | eps Classdecl = kwdclass Classid Classdef = Classheader BBclassitems BBClassitems = brace1 Classitems brace2 Classheader = Classdecl Classmom Interfaces Classmom = colon Classid | eps Classitems = Classgroup Classitems | eps Classgroup = Class_ctrl | Varlist | Mddecls Class_ctrl = colon id // in {public, protected, private} Interfaces = colon Classid Interfaces | eps Mddecls = Mdheader Mddecls | eps Mdheader = kwdfcn Md_id PParmlist Retkind Md_id = Classid colon Fcnid Fcndefs = Fcndef Fcndefs | eps Fcndef = Fcnheader BBlock Fcnheader = kwdfcn Fcnid PParmlist Retkind Fcnid = id Retkind = Kind PParmlist = parens1 Varspecs parens2 | PPonly Varspecs = Varspec More_varspecs More_varspecs = comma Varspecs | eps PPonly = parens1 parens2 Stmts = Stmt semi Stmts | eps Stmt = Stasgn | Fcall | Stif Stmt = Stwhile | Stprint | Strtn Stasgn = Lval equal Expr Lval = Varid | Aref | Deref_id Aref = Varid KKexpr KKexpr = bracket1 Expr bracket2 Fcall = Fcnid PPexprs PPexprs = parens1 Exprlist parens2 | PPonly Stif = kwdif PPexpr BBlock Elsepart Elsepart = kwdelseif PPexpr BBlock Elsepart Elsepart = kwdelse BBlock | eps Stwhile = kwdwhile PPexpr BBlock Stprint = kprint PPexprs Strtn = kwdreturn Expr | kwdreturn PPexpr = parens1 Expr parens2 Expr = Expr Oprel Rterm | Rterm Rterm = Rterm Opadd Term | Term Term = Term Opmul Fact | Fact Fact = Basekind | Lval | Addrof_id | Fcall | PPexpr Addrof_id = ampersand id Oprel = opeq | opne | Lthan | ople | opge | Gthan Lthan = angle1 Gthan = angle2 Opadd = plus | minus Opmul = aster | slash | caret C. Siska October 11, 2018 Page 3 of 3

CECS444 Project #2: LL Parser

Description

Related products

CECS444 Project #1: Lexer

CECS444 Project #3: AST Interpreter