B561 Advanced Database Concepts Assignment 2

$30.00

Category: You will Instantly receive a download link for .zip solution file upon Payment || To Order Original Work Click Custom Order?

Description

5/5 - (9 votes)

This assignment relies on the lectures
• SQL Part 1 and SQL Part 2 (Pure SQL);
• Views;
• Relational Algebra (RA); and
• Joins and semijoins.
Furthermore, the lecture on the correspondence between Tuple Relational Calculus and SQL should help you to solve problems in this assignment.
To turn in your assignment, you will need to upload to Canvas a single file
with name assignment2.sql which contains the necessary SQL statements
that solve the problems in this assignment. The assignment2.sql file must
be so that the AI’s can run it in their PostgreSQL environment. You should
use the Assignment2Script.sql file to construct the assignment2.sql file.
(Note that the data to be used for this assignment is included in this file.)
In addition, you will need to upload a separate assignment2.txt file that
contains the results of running your queries. Finally, you need to upload
a file assignment2.pdf that contains the solutions to the problems that
require it.
The problems that need to be included in the assignment2.sql are
marked with a blue bullet •. The problems that need to be included in the
assignment2.pdf are marked with a red bullet •. (You should aim to use
Latex to construct your .pdf file.)
1
Database schema and instances
For the problems in this assignment we will use the following database
schema:1
Person(pid, pname, city)
Company(cname, headquarter)
Skill(skill)
worksFor(pid, cname, salary)
companyLocation(cname, city)
personSkill(pid, skill)
hasManager(eid, mid)
Knows(pid1, pid2)
In this database we maintain a set of persons (Person), a set of
companies (Company), and a set of (job) skills (Skill). The pname
attribute in Person is the name of the person. The city attribute
in Person specifies the city in which the person lives. The cname attribute in Company is the name of the company. The headquarter
attribute in Company is the name of the city wherein the company has
its headquarter. The skill attribute in Skill is the name of a (job)
skill.
A person can work for at most one company. This information is
maintained in the worksFor relation. (We permit that a person does
not work for any company.) The salary attribute in worksFor specifies
the salary made by the person.
The city attribute in companyLocation indicates a city in which
the company is located. (Companies may be located in multiple cities.)
A person can have multiple job skills. This information is maintained in the personSkill relation. A job skill can be the job skill of
multiple persons. (A person may not have any job skills, and a job skill
may have no persons with that skill.)
A pair (e, m) in hasManager indicates that person e has person m as
one of his or her managers. We permit that an employee has multiple
managers and that a manager may manage multiple employees. (It
is possible that an employee has no manager and that an employee is
1The primary key, which may consist of one or more attributes, of each of these relations
is underlined.
2
not a manager.) We further require that an employee and his or her
managers must work for the same company.
The relation Knows maintains a set of pairs (p1, p2) where p1 and p2
are pids of persons. The pair (p1, p2) indicates that the person with pid
p1 knows the person with pid p2. We do not assume that the relation
Knows is symmetric: it is possible that (p1, p2) is in the relation but
that (p2, p1) is not.
The domain for the attributes pid, pid1, pid2, salary, eid, and
mid is integer. The domain for all other attributes is text.
We assume the following foreign key constraints:
• pid is a foreign key in worksFor referencing the primary key pid
in Person;
• cname is a foreign key in worksFor referencing the primary key
cname in Company;
• cname is a foreign key in companyLocation referencing the primary key cname in Company;
• pid is a foreign key in personSkill referencing the primary key
pid in Person;
• skill is a foreign key in personSkill referencing the primary
key skill in Skill;
• eid is a foreign key in hasManager referencing the primary key
pid in Person;
• mid is a foreign key in hasManager referencing the primary key
pid in Person;
• pid1 is a foreign key in Knows referencing the primary key pid in
Person; and
• pid2 is a foreign key in Knows referencing the primary key pid in
Person
The file Assignment2Script.sql contains the data supplied for
this assignment.
3
Pure SQL and RA SQL
In this assignemt, we distinguish between Pure SQL and RA SQL.
Below we list the only features that are allowed in Pure SQL and in
RA SQL.
In particular notice that
• JOIN, NATURAL JOIN, and CROSS JOIN are not allowed in Pure
SQL.
• The predicates [NOT] IN, SOME, ALL, [NOT] EXISTS are not allowed
in RA SQL.
The only features allowed in Pure SQL
SELECT … FROM … WHERE
WITH …
UNION, INTERSECT, EXCEPT operations
EXISTS and NOT EXISTS predicates
IN and NOT IN predicates
ALL and SOME predicates
VIEWs that can only use the above RA SQL features
The only features allowed in Pure SQL features
SELECT … FROM … WHERE
WITH …
UNION, INTERSECT, EXCEPT operations
JOIN … ON …, NATURAL JOIN, and CROSS JOIN operations
VIEWs that can only use the above RA SQL features
commas in the FROM clause are not allowed
4
1 Formulating queries in Pure SQL with and without set predicates
An important consideration in formulating queries is to contemplate
how they can be written in different, but equivalent, ways. In fact, this
is an aspect of programming in general and, as can expected, is also
true for SQL. A learning outcome of this course is to acquire skills for
writing queries in different ways. One of the main motivations for this is
to learn that different formulations of the same query can dramatically
impact performance, sometimes by orders of magnitude.
For the problems in this section, you will need to formulate queries
in Pure SQL with and without set predicates. You can use the SQL operators INTERSECT, UNION, and EXCEPT, unless otherwise specified. You
are however allowed and encouraged to use views including temporary
and user-defined views.
To illustrate what you need to do, consider the following example.
Example 1 Consider the query “ Find the pid and name of each person who (a) works for a company located in Bloomington and (b) knows
a person who lives in Chicago.”
(a) Formulate this query in Pure SQL by only using the EXISTS or
NOT EXISTS set predicates. You can not use the set operations
INTERSECT, UNION, and EXCEPT.
A possible solution is
select p.pid, p.pname
from Person p
where exists (select 1
from worksFor w
where p.pid = w.pid and
exists (select 1
from companyLocation c
where w.cname = c.cname and c.city = ’Bloomington’)) and
exists (select
from Knows k
where p.pid = k.pid1 and
exists (select 1
from Person p2
where k.pid2 = p2.pid and
p2.city = ’Chicago’));
5
(b) Formulate this query in Pure SQL by only using the IN, NOT IN,
SOME, or ALL set membership predicates. You can not use the set
operations INTERSECT, UNION, and EXCEPT.
A possible solution is
select p.pid, p.pname
from Person p
where p.pid in (select w.pid
from worksFor w
where w.cname in (select c.cname
from companyLocation c
where c.city = ’Bloomington’)) and
p.pid in (select k.pid1
from Knows k
where k.pid2 in (select p2.pid
from Person p2
where p2.city = ’Chicago’));
Another possible solution using the SOME and IN predicates
select p.pid, p.pname
from Person p
where p.pid = some (select w.pid
from worksFor w
where w.cname = some (select c.cname
from companyLocation c
where c.city = ’Bloomington’)) and
p.pid in (select k.pid1
from Knows k
where k.pid2 in (select p2.pid
from Person p2
where p2.city = ’Chicago’));
(c) Formulate this query in Pure SQL without using set predicates. A
possible solution is
select p.pid, p.pname
from Person p, worksFor w, companyLocation c
where p.pid = w.pid and
w.cname = c.cname and
c.city = ’Bloomington’
intersect
select p1.pid, p1.pname
from Person p1, Knows k, Person p2
where k.pid1 = p1.pid and
k.pid2 = p2.pid and
p2.city = ’Chicago’;
6
We now turn to the problems for this section.
1. Consider the query “Find the pid and name of each person who
works for Google and who has a strictly higher salary than some
other person who he or she knows and who also works for Google.”
(a) • Formulate this query in Pure SQL by only using the EXISTS
or NOT EXISTS set predicates.
(b) • Formulate this query in Pure SQL by only using the IN,
NOT IN, SOME, or ALL set membership predicates.
(c) • Formulate this query in Pure SQL without using set predicates.
2. Consider the query “Find the cname of each company with headquarter in Cupertino, but that is not located in Indianapolis, along
with the pid, name, and salary of each person who works for that
company and who has the next-to-lowest salary at that company.”
2
(a) • Formulate this query in Pure SQL by only using the EXISTS
or NOT EXISTS set predicates. You can not use the set operations INTERSECT, UNION, and EXCEPT.
(b) • Formulate this query in Pure SQL by only using the IN,
NOT IN, SOME, or ALL set membership predicates. You can
not use the set operations INTERSECT, UNION, and EXCEPT.
(c) • Formulate this query in Pure SQL without using set predicates.
3. Consider the query “Find each (c, p) pair where c is the cname
of a company and p is the pid of a person who works for that
company and who is known by all other persons who work for
that company.
(a) • Formulate this query in Pure SQL by only using the EXISTS
or NOT EXISTS set predicates. You can not use the set operations INTERSECT, UNION, and EXCEPT.
2By definition, a salary s is next-to-lowest if there exists salary s1 with s1 < s, but there do not exist two salaries s1 and s2 with s2 < s1 < s. 7 (b) • Formulate this query in Pure SQL by only using the IN, NOT IN, SOME, or ALL set membership predicates. You can not use the set operations INTERSECT, UNION, and EXCEPT. (c) • Formulate this query in Pure SQL without using set predicates. 4. Consider the query “Find each skill that is not a jobskill of any person who works for Yahoo or for Netflix.” (a) • Formulate this query in Pure SQL using subqueries and set predicates. You can not use the set operations INTERSECT, UNION, and EXCEPT. (b) • Formulate this query in Pure SQL without using predicates. 5. Consider the query “Find the pid and name of each person who manages all-but-one person who works for Google. (a) • Formulate this query in Pure SQL using subqueries and set predicates. You can not use the set operations INTERSECT, UNION, and EXCEPT. (b) • Formulate this query in Pure SQL without using set predicates. 8 2 Formulating queries in Relational Algebra and RA SQL Reconsider the queries in Section 1. The goal of the problems in this section is to formulate these queries in Relational Algebra in standard notation and in RA SQL. There are some further restrictions: • You can only use WHERE clauses that use conditions involving constants. For example conditions of the form “t.A θ ’a’” are allowed, but conditions of the form ’t.A θ s.B’ are not allowed. The latter conditions can be absorbed in JOIN operations in the FROM clause. • You can not use commas in any FROM clause. Rather, you should use JOIN operations. You can use the following letters, or indexed letters, to denote relation names in RA expressions: P, P1, P2, · · · Person C, C1, C2, · · · Company S, S1, S2, · · · Skill W, W1, W2, · · · worksFor cL, cL1, cL2, · · · companyLocation pS, pS1, pS2, · · · personSkill M, M1, M2, · · · hasManager K, K1, K2, · · · Knows To illustrate what you need to do reconsider the query in Example 1 in Section 1. Example 2 Consider the query “ Find the pid and name of each person who (a) works for a company located in Bloomington and (b) knows a person who lives in Chicago.” (a) Formulate this query in Relational Algebra in standard notation. A possible solution is πpid,pname(P erson ./ worksF or ./ πcname(σcity=Bloomington(companyLocation)))∩ πP erson1.pid,P erson1.pname(P erson1 ./P erson1.pid=pid1 Knows ./pid2=P erson2.pid πP erson2.pid(σcity=Chicago(P erson2))) 9 If we use the letters in the above box, this expression becomes more succinct: πpid,pname(P ./ W ./ πcname(σcity=Bloomington(cL)))∩ πP1.pid,P1.pname(P1 ./P1.pid=pid1 K ./pid2=P2.pid πP2.pid(σcity=Chicago(P2))) You are also allowed to introduce letters that denote expressions. For example, let E and F denote the expression πpid,pname(P ./ W ./ πcname(σcity=Bloomington(cL))) and πP1.pid,P1.pname(P1 ./P1.pid=pid1 K ./pid2=P2.pid πP2.pid(σcity=Chicago(P2))), respectively. Then we can write the solution as follows: E ∩ F. (b) Formulate this query in RA SQL. A possible solution is select pid, pname from Person NATURAL JOIN worksFor NATURAL JOIN (select cname from companyLocation where city = ’Bloomington’) C INTERSECT select P1.pid, P1.pname from Person P1 JOIN Knows ON (P1.pid = pid1) JOIN (select pid from Person where city = ’Chicago’) P2 ON (pid2 = P2.pid) order by 1,2; Observe that the WHERE clauses only use conditions involving constants. 10 We now turn to the problems in this section. 6. Reconsider Problem 1. Find the pid and name of each person who works for Google and who has a strictly higher salary than some other person who he or she knows and who also works for Google. (a) • Formulate this query in Relational Algebra in standard notation. (b) • Formulate this query in RA SQL. 7. Reconsider Problem 2. Find the cname of each company with headquarter in Bloomington, but not located in Indianapolis, along with the pid, name, and salary of each person who works for that company and who has the next-to-lowest salary at that company. (a) • Formulate this query in Relational Algebra in standard notation. (b) • Formulate this query in RA SQL. 8. Reconsider Problem 3. Find each (c, p) pair where c is the cname of a company and p is the pid of a person who works for that company and who is known by all other persons who work for that company. (a) • Formulate this query in Relational Algebra in standard notation. (b) • Formulate this query in RA SQL. 9. Reconsider Problem 4. Find each skill that is not a jobskill of any person who works for Yahoo or for Netflix. (a) • Formulate this query in Relational Algebra in standard notation. (b) • Formulate this query in RA SQL. 10. Reconsider Problem 5. Find the pid and name of each person who manages all-but-one person who works for Google. 11 (a) • Formulate this query in Relational Algebra in standard notation. (b) • Formulate this query in RA SQL. 12 3 Formulating constraints using Relational Algebra Recall that it is possible to express constraints in TRC and as boolean SQL queries. It is also possible to write constraints using RA expressions. Let E, F, and G denote RA expressions. Then we can write RA expression comparisons that express constraints: E 6= ∅ which is true if E evaluates to an non-empty relation E = ∅ which is true if E evaluates to the empty relation F ⊆ G which is true if F evaluates to a relation that is a subset of the relation obtained from G F 6⊆ G which is true if F evaluates to a relation that is not a subset of the relation obtained from G Here are some examples of writing constraints in this manner. Example 3 “ Some person works for Google.” This constraint can be written as follows: πpid(σcname=Google(worksF or)) 6= ∅. Indeed, the RA expression πpid(σcname=Google(worksF or)) computes the set of all person pids who work for Google. If this set is not empty then there are indeed persons who work for Google. Incidentally, the constraint “ No one works for Google” can be written as follows: πpid(σcname=Google(worksF or)) = ∅. Example 4 Each person has at least two skills. This constraint can be written as follows: πpid(P) ⊆ πpS1.pid(σpS1.skill6=pS2.skill(pS1 ./pS1.pid=pS2.pid pS2)). Indeed, πpid(P) computes the set of all person pids and πpS1.pid(σpS1.skill6=pS2.skill(pS1 ./pS1.pid=pS2.pid pS2)) 13 computes the set of all pids of persons who have at least two skills. When the first set is contained in the second we must have that each person has at least two skills. Incidentally, the constraint “ Some person has fewer than two skills” can be written as follows: πpid(P) 6⊆ πpS1.pid(σpS1.skill6=pS2.skill(pS1 ./pS1.pid=pS2.pid pS2)). We now turn to the problems in this section. Formulate each of the following constraints using RA expressions as illustrated in Example 3 and Example 4. 11. • Some person has a salary that is strictly lower than that of each of the persons who he or she manages. 12. • No person knows all persons who work for Google. 13. • Each person knows all of his or her managers. 14. • Each employee and his or her managers work for the same company. 15. • The attribute pid is a primary key of the Person relation. 14 4 Formulating queries in SQL using views Formulate the following views and queries in SQL. You are allowed to combine the features of both Pure SQL and RA SQL. 16. • Create a view Triangle that contains each triple of pids of different persons (p1, p2, p3) that mutually know each other. Then test your view. 17. • Define a parameterized view SalaryBelow(cname text, salary integer) that returns, for a given company identified by cname and a given salary value, the subrelation of Person of persons who work for company cname and whose salary is strictly below salary. Test your view for the parameter values ( ’IBM’,60000), (’IBM’,50000), and (’Apple’,65000). 18. • Define a parameterized view KnowsPersonAtCompany(p integer, c text) that returns for a person with pid p the subrelation of Person of persons who p knows and who work for the company with cname c. Test your view for the parameters (1001, ‘Amazon’), (1001,‘Apple’), and (1015,‘Netflix’). 19. • Define a parameterized view KnownByPersonAtCompany(p integer, c text) that returns the subrelation of Person of persons who know the person with pid p and who work for the company with cname c. Test your view for the parameters (1001, ‘Amazon’), (1001,‘Apple’), and (1015,‘Netflix’). 20. • Let T ree(parent : integer, child : integer) be a rooted parentchild tree. So a pair (n, m) in T ree indicates that node n is a parent of node m. The sameGeneration(n1, n2) binary relation is inductively defined using the following two rules: • If n is a node in T, then the pair (n, n) is in the sameGeneration relation. (Base rule) 15 • If n1 is the parent of m1 in T ree and n2 is the parent of m2 in T ree and (n1, n2) is a pair in the sameGeneration relation then (m1, m2) is a pair in the sameGeneration relation. (Inductive Rule) Write a recursive view for the sameGeneration relation. Test your view. 16