CS 564 Project 2: SQL

$30.00

Category: You will Instantly receive a download link for .zip solution file upon Payment

Description

5/5 - (3 votes)

DBMS
Use of postgres is highly recommended. Other DBMS’s may be used for development but the
answers must be able to run against postgres server in the lab.
Part A: Querying the Sales Database [50%]
The Sales database tables are available under “hw2” schema on the lab postgres system
(under cs564instr database). You are encouraged to create views in your own schema to make
it easier to access the hw2 tables.
[The tables can also be created using the script available in ~shatdal/data/Sales.sql. If you are
using postgres on your personal machine, to load the database, simply type “psql -f Sales.sql”
on the command prompt or “\i Sales.sql” inside psql. It would create a schema hw2 in your
postgres DBMS.]
The hw2 schema has the following 4 tables. The key of each table is underlined and the foreign
keys are also mentioned:
• Holidays(WeekDate, IsHoliday)
• Stores(Store, Type, Size)
• TemporalData (Store, WeekDate, Temperature, FuelPrice, CPI, UnemploymentRate)
• Store is a foreign key referencing Stores (Store).
• WeekDate is a foreign key referencing Holidays (WeekDate).
• Sales(Store, Dept, WeekDate, WeeklySales)
• Store is a foreign key referencing Stores (Store).
• WeekDate is a foreign key referencing Holidays (WeekDate).
• (Store, WeekDate) is a foreign key referencing TemporalData (Store, WeekDate)
Write SQL queries over the given schema that obtain the answers to the following questions:
1. Which stores had the largest and smallest overall sales during holiday weeks?
2. Get the stores at locations where the unemployment rate exceeded 10% at least once but
the fuel price never exceeded 4.
3. How many non-holiday weeks had larger sales than the overall average sales during holiday
weeks?
4. Get the total sales per month, and its contribution to total sales (across months) overall for
each type of store.
5. Which stores have had sales in every department in that store for every month of at least
one calendar year among 2010, 2011, and 2012?
6. For each of the 4 numeric attributes in TemporalData, are they positively or negatively
correlated with sales and how strong is the correlation? Your SQL query should output an
instance with the following schema with 4 rows:
Output6 (AttributeName VARCHAR(20), CorrSign char(1), CorrValue (float))
e.g., (Temperature, -, -0.5) In your query, the values of AttributeName can be hardcoded
string literals, but the other values must be computed automatically using SQL queries
over the given database instance.
7. Which departments contribute to at least 5% of store sales across for at least 3 stores? List
the departments and their average contribution to sales across the stores.
8. Get the top 10 departments overall ranked by total sales normalized by the size of the store
where the sales were recorded.
9. For the top 10 departments (in above query, #8), find the 3-monthly moving average sales,
% contribution of monthly sales to total sales and monthly cumulative total of sales. Format
the output to 2 decimal places.
10. The accounting department has asked for the following report. Write a SQL Query that
would most closely produce the needed report. Quarters are defined traditionally, with
Jan,Feb,March being Q1, etc. Note that ROLLUP is NOT available on the lab version of
postgres.
Note: use of LIMIT N; feature in postgres is not allowed to constrain number of rows returned.
PART B: Sampling Application [50%]
Since postgres (and most DBMS’s) don’t allow sampling, the goal is to create a JDBC
application that would let user create samples of data in the DBMS. The Sales data from Part A
could be used for developing/testing the application. You would use sampling without
replacement.
Year Quarters Store Type A
Sales
Store Type B
Sales
Store Type C
Sales
2010 Q1 … … …
2010 Q2 … … …
2010 Q3 … … …
2010 Q4 … … …
2010 — … … …

2012 Q4 … … …
2012 — … … …
The application should:
1. The JDBC connection should be to the lab postgres server. Instructions are posted on the
webpage as to how to connect to lab postgres server. You may optionally also connect to
your own hosted postgres for development/testing. Note that lab postgres can only be
connected to from lab linux machines because of security protocols.
2. Accept a table name or a query (prompts can distinguish the two if needed)
3. Ask for how many sample rows are desired
4. Ask if the user wants a table created for the sampled rows (instead of being returned).
[These tables could be sampled in next iteration.]
5. Fetch/insert exactly that number of random samples from the table or query result
6. Allow the user to reset the seed of the random number generator
7. Errors may be given for any syntax error in queries or invalid table entries
8. If number of samples requested is greater than rows in the table/query, all rows should be
returned (or inserted in the sample table) and a message should be given noting that fact.
9. Ask if user wants more samples (or quit)
The application MAY NOT fetch the entire table or query result and do the sampling work in the
application itself. (That is, it is assumed that the original table is too large to fit in memory.)
You may create additional tables (and insert rows) in the database (under your schema).
You may execute any query you like in the database.
Hint: One way to number all rows in a table is to use the “row_number()” ordered-analytic
function.
The algorithm for random sampling of N rows (from Knuth: Art of computer programming,
volume 2: semi-numerical algorithms) is on the last page.
Deliverables
You are required to submit a zipped folder with the following contents:
1. A “.sql” file per question. Name your files as “query.sql”, e.g., the file
“query2.sql” is for question 2 (of part A). These would contain the SQL and the result
rows.
2. The JDBC app (both java file and executable). Show the result of sampling 10 rows
from each of the tables in part A. Additional testing/validation would be done for
grade.
3. Readme.txt with your group members’ name, CS logins and Wisc ids. and describe
any assumptions etc made in the application that are not in the documentation of the
code.