CMSC 216 Project #2

$30.00

Category: You will Instantly receive a download link for .zip solution file upon Payment

Description

5/5 - (3 votes)

1 Introduction
This is a relatively short project– in terms of the amount of code to be written– intended to use some additional features
of C and UNIX that Project #1 did not require: input, reading until the end of the input, simple use of arrays, input
redirection and UNIX pipes, and the UNIX diff command, which was explained recently in discussion section. However, it is decidedly more difficult than Project #1, and will likely involve unexpected complexities, so you need to start
working on it right away.
Carefully read the course project style guide on ELMS (under Administrative and resources). It describes what
good programming style is considered to consist of for projects. Also carefully read the constraints in Appendix C
below, because you are only allowed to use certain C language features in writing the project, and will lose
significant credit for using any parts of the language other than those allowed. As in Project #1 you can lose credit
for submitting the project too many times (details are also in Appendix C).
Due to the size of the course it is not feasible for us to be able to provide project information or help via email/ELMS
messages, so we will be unable to answer such questions. However you are welcome to ask any questions verbally
during the TAs’ office hours (or during, before, or after discussion section or lecture if time permits).
Note that if you make your code publicly available on a website (like GitHub, SourceForge, PasteBin, etc.) for
others to access, you will be forwarded to the Office of Student Conduct.
2 Project description
As mentioned above, the project style guide handout describes what good programming style consists of for projects in
this course. A significant part of your score for this project will be based on your style. One aspect of style the handout
describes is that none of your program lines should exceed the standard terminal or printer width of 80 characters.
Although you could visually inspect your code to ensure this, having a program to check it for you would be more
convenient. You will write that program in this project. To avoid losing credit for style you should run it on itself– and
on your code in all future projects as well.
In most projects in this course you will write functions that we will call from our main programs (our tests of your
code), as in Project #1. This project is an exception, because you will write a standalone program instead. Actually, you
will write two different standalone programs, named lengthwarning.c and problemlines.c, which could be used
separately or together.
2.1 The line length check program lengthwarning.c
This program will check for lines in its input that are more than 80 characters. First, here are concepts and the terminology we use. A line is a sequence of zero or more characters terminated with the special newline character. The size of
a line refers to the actual number of characters in the line (not including the newline character). The newline character
indicates where the line ends, but it is not part of the line’s contents, so it does not count towards its size. The length of
a line refers to the number of spaces that the line occupies when printed. If a line has no tab characters its length equals
its size. The length of a line that contains tab characters is described in Section 2.2 below. There will not be more than
1000 characters before the newline character of any input line.
Your lengthwarning.c program needs to do the following:
• Read one input line at a time and process it as described next. As each line is being read, store its characters
into a one–dimensional array that is large enough to store just one input line, then process that one line. (Do not
try to read and store all of the lines in the entire input at once, just read and process one line at a time.)
• Determine the length of the line, either during or after reading the line.
• For the line that was just read, print the following on one output line (with no spaces other than what are described):
– In the first output column or character position print either a single blank space character if the length of the
current line that was just read was 80 or less, or a single asterisk * if the line’s length was more than 80.
© 2023 L. Herman; all rights reserved 1
– Then print the current line’s number, where the first line read is line #1. Your program’s input will never
contain more than 99999 lines, so the line number should be printed in a field of exactly 5 places, padded on
the left with blank spaces if it is less than 5 digits. (This can be done the hard way, but note that it is trivial
to accomplish in C using printf() formatting options that were explained in a recent discussion section.)
– Then print a colon, a single blank space, all the characters of the line that was read, and a newline.
(The actual input line as printed will always begin in the ninth column or output line position, because the
space or asterisk, line number, colon, and following space occupy the first eight positions.)
• An additional line of output is to be printed only for input lines whose length are longer than 80. Immediately
following the line that was read and printed, a second line must be printed that has exactly 88 blank spaces, then
enough caret characters (^) so there is a caret underneath each character of the line that would be beyond length
80. See the example below. (88 blank spaces includes 80 for the characters of the line that are within the first 80
positions, plus as mentioned 8 for the asterisk or space, line number, colon, and space.)
• Do this for all lines in the input, until the end of the input is reached.
This program’s input could be anything– it doesn’t even have to be a C program. But if you run it reading its own
code, it will tell you whether it has any lines longer than 80 characters, which you would lose credit for during grading.
Here is an example of the output that should be produced for a single input line exactly 85 characters long, with
blank spaces shown as ␣. The public test outputs have more examples of the expected output format, so look at all of
the public test inputs and their expected outputs (discussed below) before starting to code.
*␣␣␣␣1:␣This␣is␣a␣line␣with␣85␣characters.␣␣(Words␣were␣chosen␣carefully␣to␣have␣exactly␣85.)
␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣^^^^^
Note that lines can’t just be printed as they are being read, because when a line whose length is greater than 80 is
printed, an asterisk has to be printed first. So an entire line must be read so its length can first be determined, only then
can it be printed.
2.2 Tab characters
As described above the program is straightforward, although you may encounter some issues due to differences between
C and Java that the project is intended to bring up. But tab characters make it more tricky, because a tab character appears
to occupy anywhere between 1 and 8 spaces when printed. Output devices act as if there is an invisible tab stop at every
eighth character position (the eighth position, sixteenth, etc.) Printing a tab character causes the output to skip at least
one space and stop just after the next tab position. A printable character immediately after the tab will appear at that
position. The length of a line that has one or more tabs is the number of spaces the line would occupy when printed,
where each tab would contribute between 1 to 8 spaces to its length, but would only add 1 to its size. Some examples:
• The size of the line fat\tcat (where \t is the escape sequence indicating a tab character) is 7 but its length is
11. When printed, fat occupies the first three positions and the tab character causes cat to begin at the ninth
position and occupy the ninth through eleventh positions.
• The length of the line hi\tgood\tbye is 19: hi occupies two positions, the tab advances to the next tab stop so
it appears to occupy six positions, good is four positions, the next tab appears to occupy four positions, and bye
occupies three positions.
• The length of \t\toctopus is 23. The first tab advances to the eighth position, the second tab advances to the
sixteenth position, and the o would be at the seventeenth position, the c at the eighteenth, etc.
• A tab character always appears as if it occupies at least one space. Suppose a line is elephant\tZ. After elephant
is printed the output device will be at the ninth position, but the Z cannot appear there, because if it did the tab
would not be visible at all. Instead the tab causes printing to advance eight spaces so the Z would be at the
seventeenth position, and the line’s length is 17.
For a line with one or more tabs, the number of spaces that are advanced for each tab depends on the length of the
part of the line before that tab (which may itself contain tabs).
Note that although a printed tab character may appear to occupy multiple spaces, your lengthwarning.c program
must print characters exactly as they were read from the input, so any tabs must be printed as tabs. Do not print
spaces instead of tabs. (We could have said to do this but it would have involved complexities that are not immediately
apparent; in any event, the submit server is expecting all tabs to be printed as tabs to call your output correct.)
© 2023 L. Herman; all rights reserved 2
2.3 The summary program problemlines.c
The line length check program described above can give a user full information about which of their program lines are
too long, and by how much. Sometimes though a user might just want to see a summary view of only the numbers of
the lines that are too long, without the other information produced by the first program. You will also write a second
program that reads the output of your first program and does this.
This program should read the output produced by your lengthwarning.c program and print only the numbers of the
lines in the input of lengthwarning.c that lengthwarning.c identified as longer than 80 characters, in increasing line
number order obviously. If one or more lines in the input of lengthwarning.c had length more than 80 this program
should print all their numbers, on a single output line, ending in a newline. If more than one line number is printed they
should be separated by a single blank space, but there should not be any space at the beginning of the line before the
first line number or any space after the last line number printed. If there were no lines with length more than 80 this
program will only print a newline (it always prints at least a newline). The way this program tells if lengthwarning.c
saw an input line whose length was longer than 80 is to look for an asterisk as the first character printed.
Note that this program will print the numbers of the lines in the input of lengthwarning.c that were longer than 80
characters. lengthwarning.c can print more lines of output than the number of lines in its input because it prints a pair
of lines for every input line that is longer than 80 characters, with the second one having spaces and caret characters;
these caret lines will be ignored by problemlines.c.
Here is an outline of what problemlines.c should do:
• It must read a single character, which will be the first character on a line. Since it is reading output produced by
lengthwarning.c it should expect that this character will either be an asterisk or a space.
• If the character read was a space then this is a line read by lengthwarning.c whose length was not more than
80, so in this case problemlines.c must just skip the entire rest of the line whose first character was just read.
To do this it should read characters (one at a time) until it sees a newline, then read the first character on the next
line (if there is a next line), which will again be an asterisk or a space, and begin the process again.
• If on the other hand the first character that problemlines.c read on a line was an asterisk, this means that the
rest of the line read by lengthwarning.c was did have length longer than 80.
In this case problemlines.c must read a number after the character– which will be the number of the line in
the input of lengthwarning.c that was too long– and print the number. Then it must skip the rest of the line,
including its newline, just as in the case above, and also skip the entire next line (and its newline), which will be
one containing 80 spaces followed by one or more carets. Then it will read the first character on the next line (if
there is a next line) and begin the process again.
This explanation omits details such as properly handling spaces between numbers printed by problemlines.c,
which are up to you to figure out. The last public test illustrates the results produced by problemlines.c when there
was a line in the input of lengthwarning.c that had length more than 80.
problemlines.c is designed to read the output of your lengthwarning.c program. problemlines.c can be
run on any other text file, as long as the file’s contents follow the output format produced by lengthwarning.c. If
problemlines.c is run on input that does not follow the output format of lengthwarning.c then it can produce
incorrect results, or crash, or have an infinite loop. (In other words, problemlines.c is only expected to work right on
input of the proper format, and there is no required behavior if its input does not follow that format.)
Don’t forget you can only use certain C language features in this program, as well as lengthwarning.c.
2.4 Assumptions and guarantees about the input
• There can be zero or more lines in the input of either program. If the input to lengthwarning.c is empty it will
not produce any output at all, but problemlines.c should print a single newline if its input is empty.
• You are guaranteed, for both programs, that if their input is nonempty then each input line– even the last line–
will always end with a newline character. When you create (nonempty) input data files of your own with Emacs
for use in testing your program, be sure to press return at the end of the last line you type, so it will also end in
a newline. Our input data may contain any printable characters appearing on a standard US keyboard, but for
simplicity we will avoid nonprintable characters or whitespace characters other than spaces, tabs, and newlines.
© 2023 L. Herman; all rights reserved 3
• Any line in the input may have more than than 80 characters, but you may assume there will be no more than
1000 characters (any of which may be tabs) prior to the terminating newline character of any line, and that there
will be no more than 99999 lines in either program’s input.
Since there can be up to 99999 lines in the input, all of which can be 1000 characters, it would use a lot of memory
to try to read all lines of the input at once and store them all before printing anything. (In fact, your programs may
use too much memory and crash if you do that.) Both programs should just read one line at a time and process it
immediately, as described above, then read the next line and process it, etc.
3 Development suggestions
It is strongly recommended that you develop your program in parts or phases, testing at each stage that what you have
written does what it is supposed to so far. Although you can use different steps, here is a suggestion:
• First write your lengthwarning.c program to read just one line of input and simply print it. Even if the input has
more than one line, at this point it should just read and print the first line and quit. (Remember that your program
should read one single line at a time and process it before going on.) Run it using UNIX input redirection (which
is discussed in the UNIX tutorial and was illustrated in class, and which is shown below as well) with an input
file that you create (running your program on input data is discussed further below), and make sure that it does
this correctly. This will give you confidence that you are correctly reading basic input in C.
• Then modify lengthwarning.c to repeat reading a line at a time, and printing each one, until the end of the input
is seen. Stop and test the program (as above, run it with input redirection) to ensure it works right before going
on. (Note that two of the lecture examples in ~/216public on Grace illustrate reading until the end of the input;
it would be wise to study the lecture examples carefully.) If this doesn’t work then the rest the program will never
work either, so make sure the program so far does this right before going on.
• Then assuming for now that tab characters will never appear in the input, add code to count the length of each line
and, depending on it, print either an asterisk * or space before each line is printed. Make sure this works right.
• Then lengthwarning.c should print the line number, in the format specified, between the * or space and the line
itself.
• Then add code to lengthwarning.c detecting and handling tab characters that might be present in input lines,
and thoroughly test on different inputs that when tabs are present the lengths of lines are correctly calculated and
the output is right.
• Then write problemlines.c first assuming that there will not be any lines that have length greater than 80.
(It should read its entire input, stopping at the end, and just print only a newline.) This is the easier case for
problemlines.c.
• Then expand problemlines.c to handle inputs with lines that have length greater than 80 characters. First make
sure it works right if there is just one line longer than 80 characters.
• Then adjust problemlines.c to handle inputs with more than one line with length greater than 80, making sure
it is printing spaces correctly, and that there is no space after the last number it prints.
At this point your programs should be complete, so test them thoroughly on various inputs, and make sure you have
used good style throughout in writing them– carefully read the project style guide on ELMS– before submitting.
(Be sure none of their lines have length longer than 80! See Section A.4.)
A Development procedure
A.1 Obtaining the project files
You can obtain the project files on the Grace machines using commands similar to those in Project #1:
cd ~/216
tar -zxvf ~/216public/project02/project02.tgz
This will create a directory project02 containing the project files extracted from the project tarfile. You must have
your coursework in your course disk space for this class (the cd command above accomplishes this), otherwise your
submission will not be accepted. After this, cd to the project02 directory and edit the file lengthwarning.c, where
© 2023 L. Herman; all rights reserved 4
you should write the first program. A trivial version of this file exists in the tarfile, which just has a comment and a
skeleton main() function. Remove the comment and write your first program there. Do not rename the file. Its name
must be spelled and capitalized exactly as shown, otherwise it will not compile on the submit server. Even a small
difference, like Lengthwarning.c or lenthwarning.c, would result in receiving zero points when submitting. Then
write the second program in the file problemlines.c. The tarfile also has an initial file with this name.
A.2 Compiling your code
Some variations in the commands to compile your programs and check their results will be needed, because in this
project each program will be complete and have a main() function. Just use the command gcc lengthwarning.c -o
lengthwarning.x to compile your first program, and gcc problemlines.c -o problemlines.x to compile the second one. These commands will name the executable versions of the programs lengthwarning.x and problemlines.x
(assuming there were no syntax errors). (You can also run the compiler from Emacs, as your TA explained.) If you
set up your account correctly in discussion section, an alias exists for gcc that adds the required compilation options
mentioned in Project #1 to the command.
A.3 Running your programs and checking their results
The public test inputs are just text files, and your program will be run with input redirected from them. The files are
named public1.input, public2.input, etc.
The submit server can check that the output of your programs is correct, but you should not submit until after you
have thoroughly tested them yourself. Expected outputs for all the public tests are also included in the project tarfile,
for example, public1.input has an associated output file named public1.output. It would be tedious and error–
prone to have to compare your output to the expected results manually, however the UNIX diff utility described in a
recent discussion section can do this automatically.
The typical use of your second program would be to run your first program on some input, then send its output– using
a UNIX pipe (explained in the UNIX tutorial and discussion)– for the second program to read as its input. However, the
first program’s output could be redirected to a file, which the second program could be run with input redirected from.
A text file named README in the project tarfile gives the exact commands you should use to run your programs
on the public tests. Look at this file (using less README) to see the commands to use. Some tests will only run the
lengthwarning.x program and will be run, and their results compared, using a command like this:
lengthwarning.x < public1.input | diff – public1.output
This sends the output of the lengthwarning.x executable, when run reading public1.input, into the diff command. The dash in the diff command means that instead of comparing two files it will compare its input– the output
of the lengthwarning.x program– against public1.output whose name is next in the command, and if there are any
differences between them they will be displayed. diff will not print anything if no differences exist between your
output and the correct output, meaning that your lengthwarning.c program passed the test.
Other tests will run both programs, using a command like this:
lengthwarning.x < public8.input | problemlines.x | diff – public8.output
This will send the output of the lengthwarning.x executable, when run reading public8.input, into the
problemlines.x program, which reads it and in turn sends its output into the diff command, which will then compare
it against the file public8.output. As above, if diff doesn’t print anything then the output of the lengthwarning.x
program is identical to the expected output for this test (in fact, both programs must have worked right), and you passed
the test.
A.3.1 Viewing tab characters in input files and in the output
Because tab characters are whitespace they can’t be visually distinguished from spaces, and spaces at the ends of lines
can’t be seen. A UNIX utility od (“octal dump”) can help view a file containing either of these by printing the file’s
contents, character by character, showing the escape sequence representation of any special control characters. For
example, look at the seventh public test input file public7.input (using less public7.input), then examine the
results of the command od -t c public7.input. od can be useful when examining your own input files, and your
programs’ output, in case of bugs. If you’re not sure why a program is giving incorrect results you can run it with
© 2023 L. Herman; all rights reserved 5
its output redirected to a file (explained in the UNIX tutorial and discussion) and look at the redirected output file’s
contents using od.
Another easy way to see where tab characters are in a file is to display the file using cat -T instead of less, for
example, cat -T public7.input. This shows tabs as ^I.
A.4 Submitting your programs
As before, the command submit from the project directory will submit your project, but before you submit, you must
first make sure you have passed all the public tests, by compiling and running your code on them. Unless you have
versions of both programs that will at least compile, your code will fail to compile at all on the submit server.
It’s possible for a program to work fine on one machine (like the Grace machines), but not work at all on another machine (like the submit server), and this can occur more frequently in C, due to the nature of the language. Consequently,
after you submit you must log into the submit server and see whether your programs worked there.
To receive credit do not submit your project by uploading a zipfile or individual files to the submit server. You must
use the submit command.
Make sure none of your program lines have length longer than 80! If you are ready to submit you should have a
working program that will tell you if you do have any lines that are too long. To avoid losing credit, use your first
program to check both of them. First run:
lengthwarning.x < lengthwarning.c | less
Look for lines longer than 80 characters in your lengthwarning.c program and fix them (make them shorter, break
them up into two lines, etc.). Then run this command and do the same for your problemlines.c program:
lengthwarning.x < problemlines.c | less
B Grading criteria
Your grade for this project will be based on:
public tests 40 points
secret tests 40 points
programming style 20 points
Style will be a significant part of your score. To know what good style for projects consists of, and to avoid losing
credit, carefully read the project style guide on ELMS. Note that if you submit more than once your last submission
may not be the one that is graded (see details in the project policies handout), so use good style (according to the
style guide) from the beginning when you start working on the project, and throughout all your coding.
C Notes, constraints, and allowable language features
• You must implement this project using only the features of C that have been covered so far in class this semester
as of when this project is assigned. We want you to practice using these features before going on to new ones.
The project can be written more easily using features we haven’t covered yet, but one purpose of the project is
to get experience with some of the differences between Java and C that the project will illustrate, if written as
specified. In particular:
– Your programs can not include any library header files other than stdio.h.
– Your programs can not call any C library functions other than printf(), scanf(), and feof().
– Your programs can not use any format specifiers other than the ones covered so far in class (%d, %c, and
%f).
(Chapter 15 of the Reek text describes many other format specifiers and options, for example using an
asterisk in a format specifier, the effect of which is not explained here– you may not use any of these, other
than the ones covered in class specifically listed above, without anything between the % and the letter.)
– Even if you already know something about using strings in C, do not attempt to use them (because they
have also not been covered). You will lose very significant credit for using strings in the project.
– Even though Chapter 1 in the Reek text uses various C features that have not been covered in class yet, you
can not use them if they were not also covered in class, or in the lecture slides, or lecture examples.
© 2023 L. Herman; all rights reserved 6
(Exception: as in Project #1, you can use any C operators in Sections 5.1.4 through 5.1.8 of the Reek text,
and any C features in Chapter 4 of Reek also, because these are similar to Java– with the only exception that
you cannot use the goto statement mentioned in Chapter 4.)
You will lose significant credit when your project is graded if you use C features that weren’t covered yet. The
project can be written in a relatively small number of lines of code using features just covered so far.
The TAs covered some details about printf() and scanf() in discussion section that were skipped in lecture.
You can use any features of these functions that the TAs covered in discussion. If you use any features about
these functions that were not covered in lecture or discussion you will lose credit.
Reread this entire item very carefully again. Please keep in mind what it says.
• Note that the project style says that global variables may not be used in projects unless you are specifically told
to use them. You will lose credit for having any global variables in either program. (Read that sentence again.)
• All your code must be in the files lengthwarning.c and problemlines.c. Do not create any new source or
header files for the project.
• Your second program problemlines.c reads and processes the output of your first program lengthwarning.c.
On the submit server we will run any tests of problemlines.c using the output of our correct lengthwarning.c,
so even if you have any bugs in your own lengthwarning.c you can still pass tests of problemlines.c, if it
works right on them.
• When you create your own input data files to test your programs with, if you want a tab character press the tab
key. Do not type a backslash and t; this will not be a tab character. The escape sequence \t prints as a tab
character in a printf(), but it will not be a tab character in a data file (it will be a backslash an a t). Similarly,
if you want a newline, press the return key. (In Section 2.2 we used \t to indicate tabs in the input but there is no
good alternative way to indicate a tab or newline in this assignment.)
• In just a few keystrokes Emacs can reindent an entire program, so you would be certain of not losing any credit
for any indentation problems. The UNIX tutorial explains how to do this.
• Because of the way the submit server works, the main() functions of both programs must end with return 0
(which in C and UNIX means the program ran and terminated successfully).
• For this project you will lose one point from your final project score for every submission you make (up through
the end of the one–day late period) in excess of four submissions. You will also lose one point for every submission that does not compile, in excess of two noncompiling submissions. Therefore be sure to compile, run, and
test both programs’ results before submitting. (Hopefully everyone will check their code themselves carefully
and avoid these penalties.)
• If “Segmentation fault” is printed when running a program this means it had a fatal error. For the moment, the
best way to debug this or any other type of error is to add debug printf() statements to your program, to print
the values of the variables that are being used. Note that for technical reasons that will come up later in the
course, the last thing printed by every debug printf() should be a newline character, to ensure the results appear
correctly. (But before submitting you must remove any printf() statements added for debugging, because they
will throw off your output on the submit server.)
• Besides the public tests, you should write your own tests for your programs! Just use Emacs to edit and save
any files that you can then run your programs with input redirected from. But don’t modify the public tests,
otherwise you won’t be testing your code on the cases that the submit server is running. Just create your own test
inputs.
• Note that the project policies handout says that all your projects must work on at least half of the public tests
(by the end of the semester) in order for you to be eligible to pass the course. See the project policies handout for
full details.
• If you are using Windows, note that if you resize the Mobaterm window the appearance of printed tabs may
change after that (they could appear to take up a different number of spaces). So you may want to avoid resizing
the window while writing this project.
© 2023 L. Herman; all rights reserved 7
D Academic integrity
Please carefully read the academic honesty section of the syllabus. Any evidence of impermissible cooperation on
projects, use of disallowed materials or resources, publicly providing others access to your project code online, or
unauthorized use of computer accounts, will be submitted to the Office of Student Conduct, which could result in
an XF for the course, or suspension or expulsion from the University. Be sure you understand the academic integrity
requirements in the syllabus. These policies apply to all students, and the Student Honor Council does not consider
lack of knowledge of the policies to be a defense for violating them. More information is in the course syllabus– please
review it now.
The academic integrity requirements also apply to any test data for projects, which must be your own original
work. Exchanging test data or working together to write test cases is also prohibited.
© 2023 L. Herman; all rights reserved 8