Description
Overview
The main purpose of this assignment is to introduce you to Eclipse, an integrated development environment (IDE) used mostly to write Java applications. The secondary purpose is to give you some hands-on experience with debugging applications.
All subsequent assignments in this course will require you to read, write, and modify
Java code; thus, it is imperative that you learn right from the start how to properly organize Java files and how to debug erroneous Java code.
Preliminaries: JavaScript
JavaScript is an interpreted, weakly typed programming language most commonly used
in web application development (i.e., client-side JavaScript), where its main purpose is
to provide interactivity. This programming language includes a rich set of features, designed in part to simplify its interaction with the Document Object Model (DOM), which
is a tree-like structure representing the elements found in a webpage.
The JavaScript programming language follows what’s called the ECMAScript standard,
defining the syntax of the language. Throughout this course, you will be developing a
simplified version of a JavaScript parser.
For more information on JavaScript, you can visit the following links: link, link
Preliminaries: Background on Lexing
Lexing is the process of generating tokens when compiling or interpreting code. These
tokens must be generated in order for code to be parsed and, eventually, compiled/executed. Tokens, in essence, are numerical identifiers representing symbols that are
meaningful to the syntax of the programming language. For example, consider the following JavaScript (JS) expression:
x = foo(a,b);
When the above JS code is lexed, the tokens generated will be as follows:
NAME: x
ASSIGN
NAME: foo
LP
NAME: a
COMMA
NAME: b
RP
SEMI
Note that ASSIGN refers to the “=” (i.e., assignment operation), LP refers to the left
parenthesis, RP refers to the right parenthesis, and SEMI refers to the semicolon. In addition, variables and function names are identifiers, and are assigned the token NAME.
Your task
In this assignment, we provide you with a token stream generator for JavaScript code.
This program, which generates tokens as described above, will contain four bugs. Your
task is to identify these four bugs using Eclipse debugging techniques and fix the token
stream generator code.
Instructions
1. Download and install Eclipse at http://www.eclipse.org/downloads/
2. Read the Eclipse tutorial (Lab 1). The tutorial can be found in the course website.
This tutorial will teach you the basics of organizing Java packages in Eclipse, as well
as debugging in Eclipse.
3. Download the file eece310_assn1.zip. Once you’ve unzipped the file, add the folder
eece310_assn1 as a project to Eclipse. (see “How to Add Existing Projects to Eclipse”
in the tutorial)
4. Once you have set up the Eclipse project browse the code. To this end you can click
on the triangle beside the project name under the “Package explorer” pane to expand
the project’s contents. Show the contents of the folder src and the package org.mozilla.javascript. You will notice that this package contains six files, described below:
Token.java: This class contains an enumeration of all JavaScript tokens. Recall that tokens are nothing more than integers. It is, however, easier to refer to these tokens by
name (e.g., Token.ADD, Token.ASSIGN, etc.) rather than by number.
Kit.java: For the purpose of this assignment, this is a helper class containing functions
that will assist in the lexing.
ObjToIntMap.java: For the purpose of this assignment, this class helps us associate the
NAME token with its corresponding identifier name (i.e., variable name or function
name).
UniqueTag.java: This is a helper class to ObjToIntMap containing tags for special object values.
TokenStream.java (Note: this is the only class you’ll have to modify for this assignment). This class is where the lexing takes place. When a new TokenStream object
is created, a file reader containing the JavaScript code to be lexed is passed to its constructor (the file reader passed to the constructor must be stored in the sourceReader
member of the TokenStream class). Once the TokenStream object is created, each call
to the getToken() method generates the next token. In the JavaScript example given
above, the first call to getToken() generates the token NAME, the second call generates
the token ASSIGN, the third call generates NAME, the fourth call generates LP, etc.
How does getToken() know what the next token is? The answer is that it parses the
JavaScript code character by character (using the method getChar()), and determines if
any combination of characters forms a syntactically meaningful symbol. For instance, if
the character ‘+’ is encountered, getToken() first peeks at the next character. If the next
character is another ‘+’, then an increment operator (i.e., ++), represented by token INC,
is returned. If the next character is a ‘=’, then an addition assignment operator (i.e., +=),
represented by token ASSIGN_ADD, is returned. Otherwise, no meaningful symbol
goes with the ‘+’ character, so an addition operator, represented by token ADD, is returned.
Identifiers and numbers are also identified character by character; however, they are a
little trickier to lex. It is your task to read through the TokenStream.java code and try to
understand how these are lexed.
GenerateTokenStream.java: This class contains the main function, which creates a TokenStream object, calls getToken() repeatedly in a loop to generate tokens one by one,
and prints the generated tokens to standard output. When the main function is run, you
will be prompted to enter the name of the JavaScript file to parse. Have a look at the
files JavaScriptCode1.js, JavaScriptCode2.js, and JavaScriptCode3.js; these are the
JavaScript files for which you need to generate correct tokens in this assignment.
When you try running the lexer with the above JavaScript files, you will notice that a null
exception will be thrown. This is caused by one of the bugs introduced in TokenStream.-
java which you must identify. The correct tokens for the JavaScript files are found in
JavaScriptCode_CorrectTokens.txt (where is 1, 2, or 3).
Please note that only TokenStream.java contains bugs. As a result, only TokenStream.java should be modified, and all other files should remain unchanged.
Deliverables
You will submit only your corrected version of TokenStream.java. We will test your code
on the following inputs:
– JavaScriptCode1.js – correct tokens are found in JavaScriptCode1_CorrectTokens.txt
– JavaScriptCode2.js – correct tokens are found in JavaScriptCode2_CorrectTokens.txt
– JavaScriptCode3.js – correct tokens are found in JavaScriptCode3_CorrectTokens.txt
– JavaScriptCode4.js
The first three inputs (and their corresponding outputs) are made available to you, but
the fourth input is not. If you have identified and corrected the bugs properly, your code
will generate the correct tokens for all four inputs.
HINT: JavaScript allows unicode characters in variable names. This might help you find
one of the bugs.
Evaluation
Your mark will be based on the correctness of your code, as follows:
Correct output for JavaScriptCode1.js – 20%
Correct output for JavaScriptCode2.js – 20%
Correct output for JavaScriptCode3.js – 20%
Correct output for JavaScriptCode4.js – 40%
Submission guidelines
Create a zip file (with extension .zip) containing only your modified code for TokenStream.java (i.e., the zip file should not contain code for other classes). Submit the zip
file as mentioned in the email guidelines.