Description
2 Your Task
The goal of this assignment is to implement a paging strategy that maximizes the performance of the memory access in a set of predefined programs. You will accomplish this task
by using a paging simulator that has already been created for you. Your job is to write
1
the paging strategy that the simulator utilizes (roughly equivalent to the role the page fault
handler plays in a real OS). Your initial goal will be to create a Least Recently Used paging
implementation. You will then need to implement some form of predictive page algorithm
to increase the performance of your solution. You will be graded on the throughput of your
solution (the ratio of time spent doing useful work vs time spent waiting on the necessary
paging to occur).
2.1 The Simulator Environment
The simulator has been provided for you. You have access to the source code if you wish
to review it (simulator.c and simulator.h), but you should not need to modify this
code for the sake of this assignment. You will be graded using the stock simulator, so
any enhancements to the simulator program made with the intention of improving your
performance will be for naught.
The simulator runs a random set of programs utilizing a limited number of shared physical
pages. Each process has a fixed number of virtual pages, the process’s virtual memory space,
that it might try to access. For the purpose of this simulation, all memory access is due to
the need to load program code. Thus, the simulated program counter (PC) for each process
dictates which memory location that process currently requires access to, and thus which
virtual page must be swapped-in for the process to successfully continue.
The values of the constants mentioned above are available in the simulator.h file. For
the purposes of grading your assignment, the default values will be used:
• 20 virtual pages per process (MAXPROCPAGES)
• 100 physical pages (frames) total (PHYSICALPAGES)
• 20 simultaneous processes competing for pages (MAXPROCESSES)
• 128 memory unit page size (PAGESIZE)
• 100 tick delay to swap a page in or out (PAGEWAIT)
As you can see, you are working in a very resource constrained environment. You will
have to deal with attempts to access up to 400 virtual pages (20 processes times 20 virtual
pages per process), but may only have, at most, 100 physical pages swapped in at any given
time.
In addition, swapping a page in or out is an expensive operation, requiring 100 ticks to
complete. A tick is the minimum time measurement unit in the simulator. Each instruction
or step in the simulated programs requires 1 tick to complete. Thus, in the worst case where
every instruction is a page miss (requiring a swap-in), you will spend 100 ticks of paging
overhead for every 1 tick of useful work. If all physical pages are in use, this turns into 200
ticks per page miss since you must also spend 100 ticks swapping a page out in order to
make room for the required page to be swapped in. This leads to an “overhead to useful
work” ratio of 200 to 1: very, very, poor performance. Your goal is to implement a system
that does much better than this worst case scenario.
2
2.2 The Simulator Interface
The simulator exports three functions through which you will interact with it. The first
function is called pageit. This is the core paging function. It is roughly equivalent to the
page-fault handler in your operating system. The simulator calls pageit anytime something
interesting happens (memory access, page fault, process completion, etc). It passes the
function a page map for each process, as well as the current value of the program counter
for each process. See simulator.h for details. You will implement your paging strategy in
the body of this function.
The pageit function is passed an array of pentry structs, one per process. This struct
contains a copy of all of the necessary memory information that the simulator maintains
for each process. You will need the information contained in this struct to make intelligent
paging decisions. It is the simulator’s job to maintain this information. You should just read
it as necessary.
The struct contains:
long active
A flag indicating whether or not the process has completed. 1 if running, 0 if exited.
long pc
The value of the program counter for the process. The current page can be calculated
as page = pc/P AGESIZE.
long npages
The number of pages in the processes memory space. If the process is active (running),
this will be equal to MAXPROCPAGES. If the process has exited, this will be 0.
long pages[MAXPROCPAGES]
A bitmap array representing the page map for a given process. If pages[X] is 0, page
X is swapped out, swapping out, or swapping in. If pages[X] is 1, page X is currently
swapped in.
The simulator also exports a function called pagein and a function called pageout. These
functions request that a specific page for a specific process be swapped in or swapped out,
respectively. You will use these function to control the allocation of virtual and physical
pages when writing your paging strategy. Each of these functions returns 1 if they succeed
in starting a paging operation, if the requested paging operation is already in progress, or if
the requested state already exists. 100 ticks after requesting a paging operation, the operation
will complete. When calling pagein, the page maps passed to pageit will reflect the new
state of the simulator after the request completes 100 ticks later. When calling pageout,
the page maps passed to pageit will reflect the new state of the simulator in the first call
to pageit after the request is made. In short, a page is recognized as swapped out as soon
as a pageout request is made, but is not recognized as swapped in until after a pagein
request completes.
These functions return 0 if the paging request can not be processed (due
to exceeding the limit of physical pages or because another paging operation is currently in
process on the requested page) or if the request is invalid (paging operation requests nonexistent page, etc). See Figure 1 for more details on the behavior of pagein and pageout.
Figure 1 shows the possible states that a virtual page can occupy, as well as the possible
transitions between these states. Note that the page map values alone do not define all
3
Swapped Out
pages[X] = 0
Swapped In
pages[X] = 1
Swapped Out
pages[X] = 0
Swapped Out
pages[X] = 0
pageout()
[returns 1]
pagein()
[returns 1]
(pages available)
pagein()
[returns 1] 1 tick
99 ticks
99 ticks
1 tick
pageout()
[returns 1]
pagein()
[returns 1]
pageout()
[returns 0]
pageout()
[returns 1]
pagein()
[returns 0]
(no pages available)
pagein()
[returns 0]
Swap State
pages[X] = value
Node Key
Figure 1: Possible Page States and Transitions
possible page states. We must also account for the possible operations currently underway
on a page to fully define its state. While the page map for each process can be obtained
from the pageit input array of structs, there is no interface to directly reveal any operations
underway on a given page. If knowing whether or not a paging operation is underway
on a given page (and thus knowing the full state of a page) is necessary for your pageit
implementation, you must maintain this data yourself.
2.3 The Simulated Programs
The simulator populates its 20 processes by randomly selecting processes from a collection
of 5 simulated “programs”. Pseudo code for each of the possible 5 programs is provided in
Listings 1 through 5.
✞ ☎
1 # l o o p with i n n e r branch
2 for 10 30
3 run 500
4 i f . 4
5 run 900
6 e l s e
7 run 131
8 end i f
9 end
10 ex it
11 endprog
4
✝ ✆
Listing 1: Test Program 1 – A loop with an inner branch
✞ ☎
1 # one l o o p
2 for 20 50
3 run 1129
4 end
5 ex it
6 endprog
✝ ✆
Listing 2: Test Program 2 – Single loop
✞ ☎
1 # doubly−n e s t e d l o o p
2 for 10 20
3 run 1166
4 for 10 20
5 run 516
6 end
7 end
8 ex it
9 endprog
✝ ✆
Listing 3: Test Program 3 – Double Nested Loop
✞ ☎
1 # e n t i r e l y l i n e a r
2 run 1911
3 ex it
4 endprog
✝ ✆
Listing 4: Test Program 4 – Linear
✞ ☎
1 # p r o b a b i l i s t i c backward branch
2 for 10 20
3 l a b e l :
4 run 500
5 i f . 5
6 goto l a b e l
7 end i f
8 end
9 ex it
10 endprog
✝ ✆
Listing 5: Test Program 5 – Probabilistic backward branch
This simple pseudo code notation shows you what will happen in each process:
• for X Y: A “for” loop with between X and Y iterations (chosen randomly)
• run Z: Run Z (unspecified) instructions in sequence
• if P: Run next clause with probability P, run else clause (if any) with probability (1-P).
• goto label: Jump to “label”
As we discuss in the next section, you may wish to use this knowledge about the possible
programs to:
5
1. Profile processes and know which kind of programs each is an instance of.
2. Use this knowledge to predict what pages a process will need in the future with rather
high accuracy.
Note that while you know the structure of these programs, the programs flow is still
probabilistic in nature. Which branch a specific process takes or how many loop iterations
occur will be dependent upon the random seed generated by the simulator. Thus, you may
never be able to perfectly predict the execution of a process, only the probabilistic likelihood
of a collection of possible execution paths.
3 Some Implementation Ideas
In general, your pageit() implementation will need to follow the basic flow presented in
Figure 2. You will probably spend most of your time deciding how to implement the “Select
a Page to Evict” element.
Select a Process
Determine
Current Page
(pc / PAGESIZE)
Exit pageit()
Select a
Page to Evict
Is Page
Swapped In?
Remaining
Processes?
Call pagein()
Call pageout()
Investigate Error
Simulator calls
pageit()
Yes
Yes
No
No
Success
Failure
Failure
Success
Figure 2: Basic Reactive pageit() Flow
A basic “one-process-at-a-time” implementation is provided for you. This implementation
never actually ends up having to swap out any pages. Since only one process is allocated
pages at a time, no more than 20 pages are ever in use. When each process completes, it
releases all of its pages and the next process is allowed to allocate pages and run. This is a
very simple solution, and as you might expect, does not provide very good performance. Still,
6
it provides a simple starting point that demonstrates the simulator API. See pager-basic.c
for more information.
To start, create some form of “Least Recently Used” (LRU) paging algorithm. An LRU
algorithm selects a page that has not been accessed for some time when it must swap a page
out to make room for a new page to be swapped in. An LRU algorithm can either operate
globally, or with respect to a given process. In the latter case, you may wish to pre-reserve a
number of physical pages for each process and only allow each process to compete for pages
from this subset. An stub for implementing your LRU version of pageit() has been created
for you in the pager-lru.c file. Note the use of static variables in order to preserve local
state between calls to pageit(). Your LRU algorithm should perform much better than
the trivial solution discussed above, but will still suffer from performance issues. We can do
better.
To really do well on this assignment, you must create some form of predictive paging
algorithm. A predictive algorithm attempts to predict what pages each process will require
in the future and then swaps these pages in before they are needed. Thus, when these pages
are needed, they are already swapped in and ready to go. The process need not block to
wait for the required pages to be swapped in, greatly increasing performance. . Figure 3
shows a modified version of the Figure 2 flowchart for a predictive implementation of pageit.
As for the LRU implementation, a simple predictive stub has been created for you in the
pager-predict.c file.
Select a Process
Determine
Current Page
(pc / PAGESIZE)
Exit pageit()
Select a
Page to Evict
Is Page
Swapped In?
Remaining
Processes?
Call pagein()
Call pageout()
Investigate Error
Simulator calls
pageit()
Yes
Yes
No
No
Success
Failure
Failure
Success
Determine
Future Page
(More Magic)
Repeat 2x for Both Current and Future Paths
Check for Previous
Prediction Miss
Attempt to Setup
Future Prediction Hit
Figure 3: Basic Predictive pageit() Flow
There are effectively two approaches to predictive algorithms. The first approach is
7
to leverage your knowledge of the possible program types (see previous section). In this
approach, one generally attempts to heuristically determine which program each process is
an instance of by tracking the movement of the process’s program counter (PC). Once each
process is classified, you can use further PC heuristics to determine where in its execution
the process is, and then implement a paging strategy that attempts to swap in pages required
by upcoming program actions before they occur. Since the programs all have probabilistic
elements, this approach will never be perfect, but it can do very well.
The second approach to predictive algorithms is to ignore the knowledge you have been
given regarding the various program types. Instead, you might track each process’s program
counter to try to detect various common patterns (loops, jumps to specific locations, etc).
If you detect a pattern, you assume that the pattern will continue to repeat and attempt to
swap in the necessary pages touched during the execution of the pattern before the process
needs them. Working set algorithms are a subset of this approach.
Note that in any predictive operation, you ideally wish to stay 100-200 ticks ahead of
the execution of each process. This is the necessary predictive lead time in which you must
make paging decisions in order to insure that the necessary pages are available when the
process reaches them and that no blocking time is required. As Figure 3 shows, in addition
to swapping in pages predicatively, you must still handle the case where your prediction has
failed and are thus forced to reactively swap in the necessary page. This is referred to as
a predictive miss. A good predictive algorithm will minimize misses, but still must handle
them when they occur. In other words, you can not assume that your predictions will always
works and that every currently needed page is already available. Doing so will most likely
lead to deadlock.
There are a number of additional predictive notions that might prove useful involving
state-space analysis [4], Markov chains [5], and similar techniques. We will leave such solutions to the student to investigate if she wishes. Please see the references section for
additional information and ideas.
4 What’s Included
We provide some code to help get you started. Feel free to use it as a jumping off point
(appropriately cited).
1. Makefile A GNU Make makefile to build all the code listed here.
2. README As the title so eloquently instructs: read it.
3. simulator.c The core simulator source code. For reference only.
4. simulator.h The simulator header file including the simulator API.
5. programs.c Struct representing simulated programs for use by simulator. For reference and use by simulator code only.
6. pager-basic.c A basic paging implementation that only runs one process at a time.
7. pager-lru.c A stub for your LRU paging implementation.
8. pager-predict.c A stub for your predictive paging implementation.
9. api-test.c A pageit implementation that detects and prints simulator state changes.
May be useful if you want to confirm the behavior of the simulator API. Builds to
8
test-api.
10. test-* Executable test programs. Runs the simulator using your pager-*.c strategy.
Built using Makefile. The simulator provides a lot of tools to help you analyze your
program. Run ./test-* -help for information on available options. It also responds
to various signals by printing the current page table and process execution state to the
screen (try ctrl-c while simulator is executing).
11. test-api An API test program. See api-test.c.
12. see.R An R script for displaying a visualization of the process run/block activity
in a simulation. You must first run ./test-* -csv to generate the necessary trace
files. To run visualization, lunch R in windowed graphics mode (in Linux: R -g Tk &
at the command prompt) from the directory containing the trace files (or use setwd
to set your working directory to the directory containing the trace files). Then run
source(‘‘see.r’’) at the R command prompt to lunch the visualization.
5 What You Must Provide
When you submit your assignment, you must provide the following as a single archive file:
1. A copy of your LRU paging implementation
2. A copy of your best predictive paging implementation
3. A makefile that builds any necessary code
4. A README explaining how to build and run your code
6 Grading
50% of you grade will be based on the performance of the best pager implementation that
you provide. 10% will be based upon code cleanliness. The following simulation scores will
earn the corresponding number of points:
• Code does not compile without errors : 0 Points
• score >= 1.28 : 5 Points
• 0.64 <= score < 1.28 : 10 Points
• 0.32 <= score < 0.64 : 15 Points (Basic LRU implementation)
• 0.16 <= score < 0.32 : 20 Points
• 0.08 <= score < 0.16 : 30 Points
• 0.04 <= score < 0.08 : 35 Points
• 0.02 <= score < 0.04 : 40 Points
• 0.01 <= score < 0.02 : 50 Points (Good predictive implementation)
During your grading session, we will run your code using several random seeds and will
take the average of these runs as your score. Thus, if your program’s performance varies
widely from run-to-run, you may get bitten in the grading session. In the words of Client
Eastwood, “Do I feel lucky?”.
9
If your code generates warnings when building under gcc on the VM using -Wall and
-Wextra you will be penalized 1 point per warning. In addition, to receive full credit your
submission must:
• Meet all requirements elicited in this document
• Code must adhere to good coding practices.
• Code must be submitted to Moodle prior to due date.
The other 40% of your grade will be determined via your grading interview where you
will be expected to explain your work and answer questions regarding it and any concepts
related to this assignment.
7 Obtaining Code
The starting code for this assignment is available on the Moodle and on github. If you would
like practice using a version control system, consider forking the code from github. Using
the github code is not a requirement, but it will help to insure that you stay up to date with
any updates or changes to the supplied codebase. It is also good practice for the kind of
development one might expect to do in a professional environment. And since your github
code can be easily shared, it can be a good way to show off your coding skills to potential
employers and other programmers.
Github code may be forked from the project page here:
https://github.com/asayler/CU-CS3753-2012-PA4.
8 Resources
Refer to your textbook and class notes on the Moodle for an overview of OS paging policies
and implementations.
If you require a good C language reference, consult K&R[2].
The Internet[3] is also a good resource for finding information related to solving this
assignment.
The most recent version of the assignment from which this assignment was adopted is
available at [1].
You may wish to consult the man pages for the following items, as they will be useful
and/or required to complete this assignment. Note that the first argument to the “man”
command is the chapter, insuring that you access the appropriate version of each man page.
See man man for more information.
• man 1 make
References
[1] Couch, Alva. Comp111 – A5. Tufts University: Fall 2011. http://www.cs.tufts.edu/
comp/111/assignments/a5.html.
10
[2] Kernighan, Brian and Dennis, Ritchie. The C Programming Language. Second Edition:
2009. Prentice Hall: New Jersey.
[3] Stevens, Ted. Speech on Net Neutrality Bill. 2006. http://youtu.be/f99PcP0aFNE.
[4] Wikipedia. “State space (dynamical system)”. Wikipedia, The Free Encyclopedia.
2012. http://en.wikipedia.org/w/index.php?title=State_space_(dynamical_
system)&oldid=478401148. Online; accessed 12-April-2012.
[5] Wikipedia. “Markov chain”. Wikipedia, The Free Encyclopedia. 2012. http://
en.wikipedia.org/w/index.php?title=Markov_chain&oldid=486910550. Online; accessed 12-April-2012.
11