Description
Subject : Analyzing the complexity of sorting algorithms
1 Introduction
Analysis of algorithms is the area of computer science that provides tools to analyze the eciency
of dierent methods of solutions. Eciency of an algorithm depends on these parameters; i) how
much time, ii) memory space, iii) disk space it requires. Analysis of algorithms is mainly used
to predict performance and compare algorithms that are developed for the same task. Also it
provides guarantees for performance and helps to understand theoretical basis.
A complete analysis of the running time of an algorithm involves the following steps:
Implement the algorithm completely.
Determine the time required for each basic operation.
Identify unknown quantities that can be used to describe the frequency of execution of the
basic operations.
Develop a realistic model for the input to the program.
Analyze the unknown quantities, assuming the modeled input.
Calculate the total running time by multiplying the time by the frequency for each operation,
then adding all the products.
In this experiment, you will analyze dierent sorting algorithms and compare their running times
on a number of dataset with changing sizes.
2 Problem Denition
The strength of a search algorithm reveals on a data comprising a great deal of instances. For
such a data set, it is expected to observe that the sorting algorithms having worse performance
in terms of complexity like selection sort, insertion sort, or bubble sort require higher amount
of time whereas those having better performance like merge sort or quick sort nish the process
faster, which is an expected case within the scope of this assignment. In Table 1, you are provided
comparative complexity/runtime comparison of well-known sorting algorithms.
Table 1: Complexity comparison of sorting algorithms
Best Average Worst
Selection Sort Ω(n2
) θ(n2
) O(n2
)
Bubble Sort Ω(n) θ(n2
) O(n2
)
Insertion Sort Ω(n) θ(n2
) O(n2
)
Heap Sort Ω(n log(n)) θ(n log(n)) O(n log(n))
Quick Sort Ω(n log(n)) θ(n log(n)) O(n2
)
Merge Sort Ω(n log(n)) θ(n log(n)) O(n log(n))
Radix Sort Ω(nk) θ(nk) O(nk)
Page 1 of 4
BBM 204: Software Laboratory II
The dataset you are to employ in this assignment is a real dataset and contains a great deal
amount of trac ows of a test network. Trac ow is a ow that records all communication
packets, including header and payload, sent bidirectional between sender and receiver within a
certain period of time. The ow you are provided is generated from a real network trace [1]
through FlowMeter, a network trac ow generator written in Java [2]. Click here to download
the dataset. In the ow data you are provided, there are about 250,000 captures and in order for
you to comparatively compare the performance of sorting algorithms with a changing size of data,
we partitioned the whole data into smaller sizes. The partitioned and actual datasets have 100;
1,000; 50,000; 100,000; 250,000 captures, respectively.
FlowMeter generates more than 80 features including Flow ID, source and destionation IPs, and
the like. For details regarding the feature set, refer to [3]. As it is out of scope of this assignment,
actually you do not need to know what type of information is recorded in these features. So do
not get confused with the details of these features.
3 Assignment Task
As stated before, the main objective of this assignment is to analyze the runtimes of dierent
sorting algorithms whose proven mathematical complexities are given in Table 1. To do so, you
are provided a series of les as a testbed containing dierent number of ow captures.
In this assignment, you are expected to i) choose three sorting algorithms, ii) implement them in
Java, iii) sort all datasets, iv) measure the time (in second) that each algorithm requires. There is
no restriction on your algorithm choice; however you are strongly encouraged to choose them such
that each has dierent complexity (refer to Table 1) so that you are able to observe the variation
in performance especially on the big data.
While sorting a dataset, you should consider the consistency within a record. To be more precise,
let’s suppose you are sorting dataset D, according to a feature (sorting key) j. If the value of jth
key of ith ow record is greater than that of kth record, you should replace whole feature values
of i with those of k. See the example below:
Feature
Record (j-1) j (j+1)
i 7 10 8
k 4 6 1
Feature
Record (j-1) j (j+1)
k 4 6 1
i 7 10 8
Design your implementation such that it takes three arguments;
i the path of dataset with the name of le on which sorting process is applied.
ii index of the feature used as a key for sorting. The key j used in sorting process is actually
feature index (7 < j ≤ 84).
iii an option that indicates whether sorted data will be saved, or not. It should be given T (True)
to save, otherwise it should be F (False).
Your implementation should run with following command:
Page 2 of 4
BBM 204: Software Laboratory II
java assignment1
java assignment1 ../TrafficFlow1000.csv 56 T
4 Reporting
You report should clearly cover the following points:
Problem denition. You should clearly express the idea behind this assignment.
Findings. You should demonstrate the results with table(s) (as Table 2) and plots.
Discussion about the time complexity and memory requirements of every algorithm you
choose.
Note: You are not restricted to use a particular text editor that you will need to use while preparing
your report; however we encourage you to use LaTeX.
Table 2: My caption
Algorithm & Data Set TracFlow100 TracFlow1000 TracFlow50000 TracFlow100000 TracFlowAll
1st Algorithm
2nd Algorithm
3rd Algorithm
Notes
Do not miss the deadline.
Save all your work until the assignment is graded.
The assignment must be original, individual work. Duplicate or very similar assignments
are both going to be considered as cheating.
You can ask your questions via Piazza (https://piazza.com/hacettepe.edu.tr/spring2018/bbm204)
and you are supposed to be aware of everything discussed in Piazza.
You will submit your work from https://submit.cs.hacettepe.edu.tr/index.php with the le
hierarchy as below:
→ report
→ report.pdf
→ src
The class name in which main method belongs should be Assignment1.java. All classes
should be placed in src directory. Feel free to create subdirectories, corresponding the
package(s), but each should be in src directory.
This le hierarchy must be zipped before submitted (Not .rar, only .zip les are supported
by the system)
Page 3 of 4
BBM 204: Software Laboratory II
Policy
All work on assignments must be done individually unless stated otherwise. You are encouraged
to discuss with your classmates about the given assignments, but these discussions should be
carried out in an abstract way. That is, discussions related to a particular solution to a specic
problem (either in actual code or in the pseudocode) will not be tolerated. In short, turning
in someone else’s work (from internet), in whole or in part, as your own will be considered as
a violation of academic integrity. Please note that the former condition also holds for the
material found on the web as everything on the web has been written by someone else.
References
[1] Ali Shiravi, Hadi Shiravi, Mahbod Tavallaee, and Ali A Ghorbani. Toward developing a
systematic approach to generate benchmark datasets for intrusion detection. computers &
security, 31(3):357374, 2012.
[2] Arash Habibi Lashkari, Gerard Draper-Gil, Mohammad Saiful Islam Mamun, and Ali A Ghorbani. Characterization of tor trac using time based features. In ICISSP, pages 253262,
2017.
[3] Cicowmeter. http://www.unb.ca/cic/datasets/owmeter.html. (Accessed: 25.02.2018).
Page 4 of 4