COL362 Assignment-2

$30.00

Category: You will Instantly receive a download link for .zip solution file upon Payment

Description

5/5 - (7 votes)

2 Dataset 1
1. In the first part of the assignment, we’ll work with the Railway dataset. We have shared
the schema of the dataset here, but we won’t share the actual dataset that will be used for
evaluation.
1
2. The database have one table train_info:
train_no: integer train_name : text distance : integer (kms)
source_station_name : text departure_time : time input day_of_departure : text
destination_station_name : text arrival_time : time input day_of_arrival : text
3. Columns arrival_time (time when train reaches destination) and departure_time (time when
train starts from source) have time input : ISO 8601 format. Read more about the format here.
4. Similarly day_of_arrival is day of the week when train reaches destination and departure_time
is day of the week when train starts from source.
5. Keys:
(a) train no is primary key for train info table
6. Some keywords used in queries:
(a) hop : If two stations are 3 hops away that means a person would require four different
trains to reach. For eg. Station A -> B -> C -> D -> E. Here station A and E are three
hops away i.e. there are 3 intermediate stations. Station A and D are two hops away.
(b) feasible routes : A feasible route is a route where departure time of connecting train 2 is
greater than arrival time of train 1, where Monday is considered as day 1 and Sunday as day
7(so if there is connecting train from station X and if passenger reaches there at 12:30pm
on Wednesday then he/she can take other train from 12:30pm on-wards for Wednesday
till Sunday. Also the passenger should reach the destination within a week(before Monday
00:00:00).
(c) train operating on same day : this implies that the day of departure and day of arrival
for all connecting trains should be same.
7. You should use only these tables while writing solutions to the queries. You can create temporary
views while handling any SQL query but you should include SQL queries for creating and
deleting these temporary views at the starting and end of your SQL file respectively. Note –
you don’t have to define these tables in the submission file, these will already be present while
evaluation.
2.1 Queries
1. Find all destinations which are at-most two hops away from the city “KURLA”, if the first
train taken was train_no : 97131.
Note: one can return back to the source destination by taking two trains.
Sort output by destination station name in ascending order (alphabetical).
2. Find all destinations which are at-most two hops away from the city “KURLA”, if the first
train taken was train_no : 97131 and the connecting trains should operate on the same day
of the week.
Note: one can return back to the source destination by taking two trains.
Sort output by destination station name in ascending order (alphabetical).
3. Find all destinations which are at-most two hops away from “DADAR”. Also find total distance
from DADAR if taken the route, for all possible trains, given that the connecting trains should
operate on the same day of the week.
Output : destination_station_name, distance, day Sort output by destination station
name in ascending order (alphabetical).
4. Find all destination station names of feasible routes from city DADAR to destinations at
most two hops away. Sort output by destination station name in ascending order
(alphabetical).
5. Find count of the total number of combinations of trains possible between stations “CSTMUMBAI ” and “VASHI ”, given that a passenger can take at-most 3 trains (2 hop distance).
Output: Count
2
6. Find the minimum distance between all pairs of stations, if one can take the train a maximum
of 6 times.
Sort output by destination station name in ascending order (alphabetical).
7. Find all pairs of stations which are less than or equal to 3 hops away, i.e. one has to change
4 or less trains to reach the destination.
Sort output by source station name in ascending order (alphabetical).
8. Find all reachable destinations(irrespective of train arrival and departure times) from the source
station “SHIVAJINAGAR” given that the connecting trains should operate on the same day of
the week.
Sort output by destination station name in ascending order (alphabetical).
9. Find minimum distance to all reachable destinations(irrespective of train arrival and departure
times) from the source station “LONAVLA” given that the connecting trains should operate on
the same day of the week.
Output: 3 columns distance, destination and day.
Sort output by distance in ascending order. If two destinations are at same distance
then sort by destination station name in ascending order.
10. Find the longest(in terms of distance) circular chain for all stations, i.e. source and destination
is the same city.
Output: source_station_name, distance.
Sort output by source station name in ascending order (alphabetical).
11. Find the stations from which all other stations are at max one hop distance (two trains).
Output:source_station_name
Sort output by source station name in ascending order (alphabetical).
3
3 Dataset 2
1. In this second part of the assignment, you’ll work with a real Football Matches dataset.
The schema of the same is described below, but we won’t share the actual dataset that will be
used for evaluation.
2. The database will include following five tables and you should use only these tables while writing
solution of the queries. You can create temporary views while handling any SQL query but you
should include SQL queries for creating and deleting these temporary views at the starting and
end of your SQL file respectively. Note – you don’t have to define these tables in the submission
file, these will already be present while evaluation.
(a) games
gameid : integer leagueid : integer hometeamid : integer awayteamid : integer
year : integer homegoals : integer awaygoals : integer
(b) appearances
gameid : integer playerid : integer leagueid : integer goals : integer
owngoals : integer assists : integer keypasses : integer shots : integer
(c) leagues
leagueid : integer name : text
(d) players
playerid : integer name : text
(e) teams
teamid : integer name : text
3. Keys:
(a) gameid is primary key for games table
(b) leagueid is primary key for leagues table
(c) playerid is primary key for players table
(d) teamid is primary key for teams table
(e) hometeamid is foreign key for games table in relation to teams table
(f) awayteamid is foreign key for games table in relation to teams table
4. Match m is a football match between teams A and B if there exists a match g in games table
such that g.hometeamid = A, g.awayteamid = B or g.hometeamid = B, g.awayteamid
= A
5. Teams hometeamid A and hometeamid B are said to be having common teams played
against, if [hometeamid A, awayteamid C] & [hometeamid B, awayteamid C] ϵ games
table, where A, B and C all belongs to the teams table. Same goes for away teams also.
6. For the queries where length of a path between two teams is asked, hometeamid A is used as
starting team and awayteamid B should be used as ending team. Such that (hometeamid A,
awayteamid t1), (hometeamid t1, awayteamid t2), … (hometeamid tn, awayteamid
B).
7. In games table, homegoals is of hometeamid team and awaygoals is of awayteamid team.
8. In games table, year column is in YYYY format.
4
3.1 Queries
Football Matches dataset Football Matches database is an example of spatial network. Consider the collaboration network G formed by the teams and games tables. The nodes for this graph
will be the games, and there will exist an edge from team A to B iff there is a match between A and
B. There is an edge from team A to B if there exists a direct match between A and B.
Mathematically, G = (V, E) where:
• V = { team.teamid }
• E = { (u, v) : ∃ g in games s.t. g.hometeamid = u and g.awayteamid = v }
12. Find all the hometeams who played in common against hometeam Arsenal in ascending order
alphabetically by teamname.
Output: teamnames
13. Find the first hometeam who played in common against hometeam Arsenal, with max
total goals.
Output: teamnames, goals, year
14. Find all the teams with difference between homegoal and awaygoal greater than 3 (homegoal
– awaygoal) in the year 2015 in matches with teams common against hometeam Leicester
in increasing order of goal difference.
Output: teamnames, goals
15. Find the name of players who scored the highest number of goals in the awaymatches with
teams in common against hometeam Valencia (if more than 1 order them by decreasing order
of goals).
Output: playernames, score
16. Find the name of players who assisted the most number of times in the home matches with
teams in common against team Everton in alphabetical order of their names.
Output: playernames, assistscount
17. Find the name of players who shot the most number of shots in the away matches with teams
in common against team AC Milan in 2016 in alphabetical order of their names.
Output: playernames, shotscount
18. Find top 5 teams who has scored 0 awaygoals in 2020 in the away matches with teams in
common against team AC Milan in alphabetical order of teamname.
Output: teamname, year
19. Find name of players who scored top goals in the away matches with teams in common against
the top scorer team of each league in 2019 in decending order of teamscore.
Output: leaguename, playernames, playertopscore, teamname, teamtopscore
20. The concepts of graphical analysis can also be applied to the dataset of such a kind where,
Given two teams Manchester United as home team and Manchester City as away team,
what is the maximum length of path calculated between two teams as nodes of a graph?
Output: count
5
21. Given two teams Manchester United as home team and Manchester City as away team,
what is the total number of paths calculated between two teams as nodes of a graph?
Output: count
22. In each League, what is the longest path between two teams as nodes of a graph in
alphabetical order of League names?
Output: leaguename, teamAname, teamBname, count
6