Assignment 4: Kafka Streams ECE 454

$30.00

Category: You will Instantly receive a download link for .zip solution file upon Payment || To Order Original Work Click Custom Order?

Description

5/5 - (1 vote)

Overview

• In this assignment, you will implement a simple real-time
stream processing application using the Kafka Streams API.

• A partial implementation of the application is provided in
the starter code tarball.

• Your goal is to complete the code so that it meets the
functional and non-functional requirements.

• ZooKeeper and Kafka are provided on manta.uwaterloo.ca
on the default ports (2181 and 9092, respectively).

• ZooKeeper is provided because it is needed by Kafka. Your
Java code should not interact with ZooKeeper directly.

Learning objectives
Upon successful completion of the assignment, you will know
how to:
• process incoming messages from a topic and output
messages to a topic using the Kafka Streams API

• use functional programming to manipulate streams
• use Kafka serializers and deserializers

Inputs and outputs
Your application will receive two data streams as input, and
will produce a single data stream as output. The names of the
topics corresponding to these streams will be provided on the
command line.

Kafka Streams
application
(your code)
student-$USER
classroom-$USER
output-$USER

Note: The variable $USER denotes your Nexus user ID.

Inputs
• Your application will consume messages from two Kafka
topics containing information regarding students and
classrooms.

• The first topic provides info on students and their
whereabouts. Each message is a string key-value pair of the
following form:
Student_ID,Room_ID

• The second topic provides info on classrooms and their
capacity. Each message is a string key-value pair of the
following form:
Room_ID,max_capacity

Input examples: classroom topic
• In the following example, RoomA is assigned a maximum
capacity of 10, then RoomB is assigned a maximum capacity
of 10, and finally the maximum capacity of RoomB is
increased from 10 to 15.

• Sequence of messages written to classroom topic:
RoomA,10
RoomB,10
RoomB,15

Input examples: student topic
• In the following example, Student100 enters RoomB, then
Student200 enters RoomB, and finally Student100 moves
from RoomB to RoomA.

• Sequence of key-value messages written to student topic:
Student100,RoomB
Student200,RoomB
Student100,RoomA

Output
• The goal of the application is to output (i.e., commit to the
output topic) rooms for which the current occupancy
exceeds the maximum capacity, along with some additional
information. For each such room, the application should
output a string key-value pair of the following form:
Room_ID,current_occupancy

• If the occupancy of a room is decreased subsequently to a
value less than or equal the capacity, or if the capacity is
increased to a value greater than or equal the occupancy, the
application should output a string key-value pair of the
following form:
Room_ID,OK

Example of inputs and output
student topic classroom topic output topic
RoomA,2
RoomB,2
Student100,RoomA
Student200,RoomA
Student300,RoomA
RoomA,3
Student400,RoomA
RoomA,4
Student100,RoomB
RoomA,3
Student200,RoomB
RoomA,OK
Student300,RoomB
RoomB,3

Notes
1. The correct output depends not only on the messages supplied
in the input streams, but also on the order in which student and
classroom messages are processed.

2. If a student enters a room for which there is no room data, then
assume that the room’s max capacity is unbounded.

3. The application should be configured so that each input
message is consumed and processed in around one second or
less.

For example, if the occupancy of a room increases beyond
the maximum capacity then the corresponding output record
should be committed to the output topic within around one
second.

When the grading script commits a message to one of
the input topics, it will wait up to five seconds for the
application to commit a message to the output topic.

Packaging and submission
• All your Java classes must be in the default package.
• Please keep the code simple use a single Java file called
A4Application. A partial implementation and build script are
provided.

• Do not change the structure of the command line arguments of
the provided A4Application program.
• The classpath for this assignment comprises all the jar files
packaged with Kafka under kafka_2.11-2.3.0/libs.

• Use the provided package.sh script to create a tarball for
electronic submission, and upload it to the appropriate LEARN
dropbox before the deadline.

• The list of group members should be provided in a text file called
group.txt, as in earlier assignments.

Grading scheme
Evaluation structure:
Correctness of outputs: 100%
Penalties of up to 100% will apply in the following cases:
• the solution does not use Kafka Streams

• grading script cannot see the output because the application
takes too long (i.e., several seconds instead of roughly one
second) to produce the output after a state change

• solution cannot be compiled or throws an exception during
testing despite receiving valid input
• solution produces incorrect outputs
• solution is improperly packaged

Additional info and hints

Creating and resetting topics
• The two input topics and the output topic must be created
prior to running your application.

• You may also want to purge the messages in these topics
occasionally during testing.

• The starter code includes a script called reset_topics.sh that
performs both functions. It first deletes all three topics and
then (re)creates them.

Resetting your application
• Some stream processing applications use stateful operators,
like count. These operators maintain state in a fault-tolerant
manner using Kafka state stores.

• In addition, Kafka keeps track of which messages have been
consumed by your application.

• You may want to reset this internal state prior to each run,
which entails performing a local reset via the KafkaStreams
class, as well as a global reset via the kafka-streamsapplication-reset.sh utility provided in Kafka.

• The starter code includes a script called reset_app.sh that
performs both types of reset on your application.

Producing inputs
• The producer_student.sh and producer_classroom.sh scripts
provided with the starter code are wrappers around the
Kafka command line producer utility.

• They allow you to enter key-value pairs into the input topics
using the console.
• The key and value are entered on one line, separated by a
comma with no spaces around it.
• Shut the producers down before running reset_app.sh to
reset the topics correctly.

Consuming outputs
• The consumer.sh script provided with the starter code is a
wrapper around the Kafka command line consumer utility.

• It allows you to dump the output of your application to the
console.
• Shut the consumer down before running reset_app.sh to
reset the topics correctly.

Additional guidelines
• Use built-in Kafka features as much as possible instead of
rolling your own code for fundamental stream operations.

• You may use the default state store for stateful stream
operators. Do not bypass Kafka’s state store by storing data
(e.g., number of students in each classroom) in ordinary
program variables.

• Setting the CACHE_MAX_BYTES_BUFFERING_CONFIG
property to zero should ensure that the application produces
outputs in a timely manner. This is done for you in the
mainline of the starter code.

• Using Kafka correctly will ensure that your application is fault
tolerant. We will not inject failures during grading.