Description
Question 1: Ring Around the Rosy [12 points]
Consider a ring-based failure detection scheme with processes p1, …, pn. The process pi sends
a heartbeat every T seconds to process pi+1, and the process pn sends a heartbeat to p1.
(6) (a) This failure detection mechanism is not complete, since it does not tolerate failure of
two neighboring processes. How would you modify it so that a failure of any two
processes is always detected? Be explicit in your answer and compare the
bandwidth cost of your proposed scheme to the original ring-based failure detection
scheme. It is required that the proposed scheme should maintain a ring structure for
the processes.
(4) (b) Assume now that pi only sends a heartbeat to pi+1 after receiving a heartbeat from pi1, and then waits T seconds. Specifically:
1. p1 sends a heartbeat to p2;
2. p2 receives the heartbeat, waits T seconds, and then sends a heartbeat to p3;
3. p3 receives the heartbeat, waits T seconds, and sends a heartbeat to p4;
4. … and so on.
Provided that there is exactly one heartbeat (or token) circulating in the ring at any
time, what should the failure timeout be set to, given a maximum one-way delay of ∆,
to ensure the accurate failure detection? What would be the maximum detection
time?
(2) (c) The token scheme above ensures that a failure of one process is detected by every
other process, but it does not directly reveal which process has failed. Sketch a
modification of the failure detection algorithm, so that identity of the failed process
can be discovered. The modification should preserve the token-based design.
Question 2: Xtreme Drift [6 points]
Usually the clock drift is small enough, so it only occasionally needs to be corrected for. This
question considers the problem with larger drifts.
(2) (a) Consider a heartbeat protocol in a system with the maximum one-way delay ∆ and a
period of T. Assume that the clock drift is bounded by 1% (i.e., the clocks lose or
gain at most 1s every 100s relative to each other). What should be the value of a
timeout to ensure the accurate failure detection?
(2) (b) What would be the maximum detection time of a failure?
Page 2
(2) (c) Client A uses Cristian’s algorithm to synchronize with server B. If the RTT between A
and B is 20 ms, how often should Cristian’s algorithm be re-run to keep the clocks
synchronized within 100 ms, if the maximum drift rate is again 1%?
Question 3: You Say Go Slow, I Fall Behind [18 points]
(6) (a) Consider the following diagram that lists five servers (A, B, C, D, E), and the RTT for
each link between them. Explain which servers should synchronize with which other
servers in minimize the worst-case skew between any two servers. What will be the
worst-case skew in this case?
(6) (b) Consider the following diagram that lists transmission and reception time of NTP
messages between servers A and B in each server’s clock. Assuming there is no
clocks drift, what is an upper bound on the one-way transmission time of each of the
six messages shown? Hint: consider an entire diagram, not just adjacent message
pairs
(6) (c) Based on your answer in part (a), what is the maximum skew of server B with
respect to server A? What is the minimum skew? By how much the time of server B
should be corrected and in which direction?
Page 3
Question 4: Teach Me How To Be Sensible and Keep The Timestamps Logical [18 points]
The following figure shows timeline of 16 events (A to P) for four processes. The numbers along the
bottom horizontal axis indicate the real time instances.
Answer the following questions.
(6) (a) Write down the Lamport timestamp of each event.
(6) (b) Write down the vector timestamp of each event.
(6) (c) List all events concurrent with the event (i) C, (ii) I, and (iii) O, respectively.
Question 5: Sna-sna-snapshots everybody! [6 points]
Consider again the timeline of events shown in the figure above in Question 4. Assume that P3
initiates the Chandy-Lamport snapshot algorithm at time instance 9. Assuming the FIFO channels,
write down all possible consistent cuts that the resulting snapshot can capture. The cuts can be
described by their corresponding frontier events.