CECS444 Project #1: Lexer

$30.00

Category: You will Instantly receive a download link for .zip solution file upon Payment

Description

5/5 - (8 votes)

Introduction The objective of this assignment is to write a lexer for the A4 language lexicon (i.e., legal “words”). The lexer transforms an A4 high-level program into a list of tokens for that program (in a special format). For convenience, this lexer will take input from standard-input (stdin) and send output to standard-output (stdout). Coding is in Java. Note, this project will form the basis for the next project, so apply Rule #4 as appropriate. Team The team may contain from two to four members. Pick a three-letter name for the team. (If two teams pick the same TLA, I will disambiguate them.) Guidelines The A4 Lexcon is given, below. It has a regex section for complicated words and a more straight forward simple string section (which is also regex, but trivial). Each token type has an ID number. Recall that a lexer is built via a DFA, (usually in the form of a set of nested switch statements) and by calling a couple character functions: peek() and advance(). Peek() returns a copy of the next character in the input stream. Advance() moves past that next character. In addition to the Lexicon, shown below, it is to be assumed that whitespace delimits tokens, but whitespace is itself not a token; meaning that encountering a whitespace char will terminate a token (e.g., an identifier). Further, a comment (see the lexicon) is treated as whitespace. The lexer must be hand-built, rather than built by using a lexer generator program (e.g., LEX). Small sample A4 programs and their corresponding Token records as output are shown, below. Each lexer is to be built by a team of up to 3 students). Note, a period character not in quotes is a regex operator that matches any character except newline. A4 Lexicon comment = ‘//’ .* // Treat as whitespace up to newline char; like C/C++/Java. 2 id = LU LUD * // identifier. LU = ‘_’ | [a-zA-Z] // Letter-Underscore. LUD = LU | [0-9] // Letter-Underscore-Digit. 3 int = SIGN ? DIGITS // integer. 4 float = int [ ‘.’ DIGITS ] ? // float. 5 string = ‘”‘ .* ‘”‘ // Cannot contain a double-quote char. SIGN = plus | minus DIGITS = [0-9] + // Unpaired delimiters 6 comma = ‘,’ 7 semi = ‘;’ // Keywords 10 kprog = “prog” 11 kmain = “main” 12 kfcn = “fcn” 13 kclass = “class” 15 kfloat = “float” 16 kint = “int” 17 kstring = “string” 18 kif = “if” 19 kelseif = “elseif” 20 kelse = “else” 21 kwhile = “while” 22 kinput = “input” 23 kprint = “print” 24 knew = “new” 25 kreturn = “return” 26 kvar = “var” // Paired delimeters 31 angle1 = ‘<‘ 32 angle2 = ‘>’ 33 brace1 = ‘{‘ 34 brace2 = ‘}’ 35 bracket1 = ‘[‘ 36 bracket2 = ‘]’ 37 parens1 = ‘(‘ 38 parens2 = ‘)’ // Other punctuation tokens 41 aster = ‘*’ 42 caret = ‘^’ 43 colon = ‘:’ 44 dot = ‘.’ 45 equal = ‘=’ 46 minus = ‘-‘ 47 plus = ‘+’ 48 slash = ‘/’ C. Siska August 31, 2018 Page 1 of 5 444 Compilers — Project #1 — Lexer // Multi-char operators 51 oparrow = “->” 52 opeq = “==” 53 opne = “!=” 54 ople = “<=” 55 opge = “>=” 56 opshl = “<<” 57 opshr = “>>” // Miscellaeous 99 error // Unknown token. 0 eof // End-of-Input.\ A4 Token Output Format (Tok: <ID#> line= str= “” [int= | float= ]?) The ID# is that of the A4 Lexicon, described above. For a string token (id==5), the string token’s input double-quotes are not included in the str field. Those double-quotes are only used to help delimit and identify the string’s character sequence. For an int (id==3) or a float (id==4), include the optional “int=” or “float=” field. You compute the actual value from the string of digits. The token’s line field contains the source line number (1-based) where the token starts. Examples (see more examples below): (Tok: 3 line= 1 str= “65” int= 65) (Tok: 7 line= 1 str= “;”) The first example shows a token with ID #3 on line #1, its an int token. The second example shows a semicolon token with ID #7 on line #1. Development Note: For teams, pick small functions/methods and agree on a small quick API. Create sample expected input output (EIO) that represents a valid use of the API. Make sure your small pieces work according to that API. Don’t assume that your partners will get things correct. Check back with them at least every other day. Submission Your submission must, at a minimum, include a plain ASCII text file called README.txt, all necessary source files, (and if you use them: libraries, and build configuration files) to allow the submission to be built and run independently by the instructor. [For this project, no unusual files are expected.] All files must include a comment header identifying the author(s), author contact information, and a brief description of the files submitted. The README file must describe any bugs and features along with a description of what was or was not completed in the solution. The README must also describe the use of the program (and if there are any external dependencies). Finally, for a pair team, the README must indicate which functions/methods were authored by whom. Do not include any object files, binary executables, or other superfluous files. Place your submission files (e.g., the README and .jar files) in a folder named 444-p1_teamID. For example, if your team name were KAZ, then your folder would be named 444-p1_KAZ. Then zip up this folder. Name the .zip file the same as the folder name. Turn in by 11pm on the due date (in the bulletin board post) by sending me email with the zip file attached. [NB, If your emailer will not email a .zip file, then change the file extension from .zip to .zap, and tell me so in the email.] The email subject title should also include the folder name. Please include your name(s) and campus ID(s) at the end of the email (because some email addresses don’t make this clear). Grading •75% for compiling and correctly executing with no errors or warnings •15% for clean and well-documented code •5% for a clean and reasonable README file •5% for correctly following the submission guidelines (below) C. Siska August 31, 2018 Page 2 of 5 444 Compilers — Project #1 — Lexer prog main { print( “ASCII:”, ” A= “, 65, ” Z= “, 90 ); } A4-sample-1.alex format // .alex is the extension for an A4 lexer token file. (Tok: 10 line= 1 str= “prog”) (Tok: 11 line= 1 str= “main”) (Tok: 33 line= 1 str= “{“) (Tok: 23 line= 1 str= “print”) (Tok: 37 line= 1 str= “(“) (Tok: 5 line= 1 str= “ASCII:”) (Tok: 6 line= 1 str= “,”) (Tok: 5 line= 1 str= ” A= “) (Tok: 6 line= 1 str= “,”) (Tok: 3 line= 1 str= “65” int= 65) (Tok: 6 line= 1 str= “,”) (Tok: 5 line= 1 str= ” Z= “) (Tok: 6 line= 1 str= “,”) (Tok: 3 line= 1 str= “90” int= 90) (Tok: 38 line= 1 str= “)”) (Tok: 7 line= 1 str= “;”) (Tok: 34 line= 1 str= “}”) (Tok: 0 line= 1 str= “”) EX #2: A4-sample-2.acod program prog main { // Find the circumference of a circle. pi = 3.14; print( “Input radius> ” ); rx = input ( float ); circum = 2 * pi * rx; print( “Circumf= “, circum ); } A4-sample-2.alex format (Tok: 10 line= 1 str= “prog”) (Tok: 11 line= 1 str= “main”) (Tok: 33 line= 1 str= “{“) (Tok: 2 line= 2 str= “pi”) (Tok: 45 line= 2 str= “=”) (Tok: 4 line= 2 str= “3.14” float= 3.14) (Tok: 7 line= 2 str= “;”) (Tok: 23 line= 3 str= “print”) (Tok: 37 line= 3 str= “(“) (Tok: 5 line= 3 str= “Input radius> “) (Tok: 38 line= 3 str= “)”) (Tok: 7 line= 3 str= “;”) (Tok: 2 line= 4 str= “rx”) (Tok: 45 line= 4 str= “=”) (Tok: 22 line= 4 str= “input”) (Tok: 37 line= 4 str= “(“) (Tok: 15 line= 4 str= “float”) (Tok: 38 line= 4 str= “)”) (Tok: 7 line= 4 str= “;”) (Tok: 2 line= 5 str= “circum”) (Tok: 45 line= 5 str= “=”) (Tok: 3 line= 5 str= “2” int= 2) (Tok: 41 line= 5 str= “*”) (Tok: 2 line= 5 str= “pi”) (Tok: 41 line= 5 str= “*”) (Tok: 2 line= 5 str= “rx”) (Tok: 7 line= 5 str= “;”) (Tok: 23 line= 6 str= “print”) (Tok: 37 line= 6 str= “(“) (Tok: 5 line= 6 str= “Circumf= “) (Tok: 6 line= 6 str= “,”) (Tok: 2 line= 6 str= “circum”) (Tok: 38 line= 6 str= “)”) (Tok: 7 line= 6 str= “;”) (Tok: 34 line= 7 str= “}”) (Tok: 0 line= 7 str= “”) C. Siska August 31, 2018 Page 3 of 5 444 Compilers — Project #1 — Lexer EX #3: A4-sample-3.acod program prog main { // Find the hypotenuse of a right triangle. print( “Input legs> ” ); a = input( int ); b = input( int ); print( “Hypotenuse= “, ( a * a + b * b ) ^ 0.5 ); } A4-sample-3.alex format (Tok: 10 line= 1 str= “prog”) (Tok: 11 line= 1 str= “main”) (Tok: 33 line= 1 str= “{“) (Tok: 2 line= 2 str= “print”) (Tok: 37 line= 2 str= “(“) (Tok: 5 line= 2 str= “Input legs> “) (Tok: 38 line= 2 str= “)”) (Tok: 7 line= 2 str= “;”) (Tok: 2 line= 3 str= “a”) (Tok: 45 line= 3 str= “=”) (Tok: 22 line= 3 str= “input”) (Tok: 37 line= 3 str= “(“) (Tok: 15 line= 3 str= “int”) (Tok: 38 line= 3 str= “)”) (Tok: 7 line= 3 str= “;”) (Tok: 2 line= 4 str= “b”) (Tok: 45 line= 4 str= “=”) (Tok: 22 line= 4 str= “input”) (Tok: 37 line= 4 str= “(“) (Tok: 16 line= 4 str= “int”) (Tok: 38 line= 4 str= “)”) (Tok: 7 line= 4 str= “;”) (Tok: 23 line= 5 str= “print”) (Tok: 37 line= 5 str= “(“) (Tok: 5 line= 5 str= “Hypotenuse= “) (Tok: 6 line= 5 str= “,”) (Tok: 37 line= 5 str= “(“) (Tok: 2 line= 5 str= “a”) (Tok: 41 line= 5 str= “*”) (Tok: 2 line= 5 str= “a”) (Tok: 47 line= 5 str= “+”) (Tok: 2 line= 5 str= “b”) (Tok: 41 line= 5 str= “*”) (Tok: 2 line= 5 str= “b”) (Tok: 38 line= 5 str= “)”) (Tok: 42 line= 5 str= “^”) (Tok: 4 line= 5 str= “0.5” float= 0.5) (Tok: 38 line= 5 str= “)”) (Tok: 7 line= 5 str= “;”) (Tok: 34 line= 6 str= “}”) (Tok: 0 line= 6 str= “”) Error Handling Use Rule #0, (which is also reflected in the use-case main scenario: No Frills), to get a working version early. If you detect a mistake in A4 source program input, then output an error message and abort the lexer program. Testing Test that your lexer works on the above three programs. Also, create two further A4 “programs” of your own, to see if it works on yours. Add the pairs of tested A4 source input and A4 token output files to your test directory folder and include that folder in your submission .zip file. Readme File You should provide a README.txt text file. Be clear in your instruction on how to build and use the project by providing instructions a novice programmer would understand. If there are any external dependencies for building, the README must also list them and how to find and incorporate them. Usage should include an example invocation. A README would cover the following: • Class number • Project number and name • Team name and members • Intro (including the algorithm used) • Contents: Files in the .zip submission • External Requirements (None?) • Setup and Installation (if any) • Sample invocation & results to see • Features (both included and missing) • Bugs (if any) C. Siska August 31, 2018 Page 4 of 5 444 Compilers — Project #1 — Lexer Academic Rules Correctly and properly attribute all third party material and references, lest points be taken off. Submission All Necessary Files: Your submission must, at a minimum, include a plain ASCII text file called README.txt, all necessary source files to allow the submission to be built and run independently by the instructor. [For this project, no unusual files are expected.] Note, the instructor not use use your IDE or O.S. Headers: All source code files must include a comment header identifying the author, author’s contact info (please, no phone numbers), and a brief description of the file. No Binaries: Do not include any IDE-specific files, object files, binary executables, or other superfluous files. Project Folder: Place your submission files in a folder named 444-p1_teamname. For example, if your team name is ABC then name the folder ”444-p1_ABC”. Project Zip File: Then zip up this folder. Name the .zip file the same as the folder name. (Note, I spend a bit of time renaming wayward project files, and I’d rather not have to.) Turn in by 11pm on the due date (as specified in the bulletin-board post) by sending me email (see the Syllabus for the correct email address) with the zip file attached. The email subject title should include the folder name. ZAP file: If your emailer will not email a .zip file, then change the file extension from .zip to .zap, attach that, and tell me so in the email. Email Body: Please include your team members’ names and campus IDs at the end of the email. Project Problems: If there is a problem with your project, don’t put the problem description in the email body – put it in the README.txt file. The Cloud: Do not provide a link to Dropbox, Gdrive, or other cloud storage. Note, cloud files (e.g., Gdrive) are not accepted. Grading • 75% for compiling and executing correctly with no errors or warnings • 10% for clean and well-documented code (Rule #4) • 10% for a clean and reasonable README file • 5% for successfully following Submission rules C. Siska August 31, 2018 Page 5 of 5