B561 Advanced Database Concepts Assignment 4

$30.00

Category: You will Instantly receive a download link for .zip solution file upon Payment

Description

5/5 - (6 votes)

This assignment relies on the lectures
• Functions and expressions in SQL;
• Aggregate functions and partitioning; and
• Triggers.
To turn in your assignment, you will need to upload to Canvas a single file
with name assignment4.sql which contains the necessary SQL statements
that solve the problems in this assignment. The assignment4.sql file must
be so that the AI’s can run it in their PostgreSQL environment. You should
use the Assignment-Script-2021-Fall-assignment4.sql file to construct
the assignment4.sql file. (Note that the data to be used for this assignment
is included in this file.) In addition, you will need to upload a separate
assignment4.txt file that contains the results of running your queries.
You will also see several problems that are listed as practice problems.
You should not include your solutions for such problems in the materials
you submit for this assignments. Only solutions for problems 1 through 10
need to be submitted.
1
Database schema and instances
For the problems in this assignment we will use the following database
schema:1
Person(pid, pname, city)
Company(cname, headquarter)
Skill(skill)
worksFor(pid, cname, salary)
companyLocation(cname, city)
personSkill(pid, skill)
hasManager(eid, mid)
Knows(pid1, pid2)
In this database we maintain a set of persons (Person), a set of
companies (Company), and a set of (job) skills (Skill). The pname
attribute in Person is the name of the person. The city attribute
in Person specifies the city in which the person lives. The cname attribute in Company is the name of the company. The headquarter
attribute in Company is the name of the city wherein the company has
its headquarter. The skill attribute in Skill is the name of a (job)
skill.
A person can work for at most one company. This information is
maintained in the worksFor relation. (We permit that a person does
not work for any company.) The salary attribute in worksFor specifies
the salary made by the person.
The city attribute in companyLocation indicates a city in which
the company is located. (Companies may be located in multiple cities.)
A person can have multiple job skills. This information is maintained in the personSkill relation. A job skill can be the job skill of
multiple persons. (A person may not have any job skills, and a job skill
may have no persons with that skill.)
A pair (e, m) in hasManager indicates that person e has person m as
one of his or her managers. We permit that an employee has multiple
managers and that a manager may manage multiple employees. (It
is possible that an employee has no manager and that an employee is
1The primary key, which may consist of one or more attributes, of each of these relations
is underlined.
2
not a manager.) We further require that an employee and his or her
managers must work for the same company.
The relation Knows maintains a set of pairs (p1, p2) where p1 and p2
are pids of persons. The pair (p1, p2) indicates that the person with pid
p1 knows the person with pid p2. We do not assume that the relation
Knows is symmetric: it is possible that (p1, p2) is in the relation but
that (p2, p1) is not.
The domain for the attributes pid, pid1, pid2, salary, eid, and
mid is integer. The domain for all other attributes is text.
We assume the following foreign key constraints:
• pid is a foreign key in worksFor referencing the primary key pid
in Person;
• cname is a foreign key in worksFor referencing the primary key
cname in Company;
• cname is a foreign key in companyLocation referencing the primary key cname in Company;
• pid is a foreign key in personSkill referencing the primary key
pid in Person;
• skill is a foreign key in personSkill referencing the primary
key skill in Skill;
• eid is a foreign key in hasManager referencing the primary key
pid in Person;
• mid is a foreign key in hasManager referencing the primary key
pid in Person;
• pid1 is a foreign key in Knows referencing the primary key pid in
Person; and
• pid2 is a foreign key in Knows referencing the primary key pid in
Person
3
Pure SQL and RA SQL
In this assignemt, we distinguish between Pure SQL and RA SQL.
Below we list the only features that are allowed in Pure SQL and in
RA SQL.
In particular notice that
• join, NATURAL join, and CROSS join are not allowed in Pure
SQL.
• The predicates [not] IN, SOME, ALL, [not] exists are not allowed
in RA SQL.
The only features allowed in Pure SQL
select … from … where
WITH …
union, intersect, except operations
exists and not exists predicates
IN and not IN predicates
ALL and SOME predicates
VIEWs that can only use the above RA SQL features
The only features allowed in RA SQL
select … from … where
WITH …
union, intersect, except operations
join … ON …, natural join, and CROSS join operations
VIEWs that can only use the above RA SQL features
commas in the from clause are not allowed
Full SQL
all the features of Pure SQL and RA SQL
user-defined functions
aggregate functions
group … by …
having …
4
1 Solving queries using aggregate functions
Formulate the following queries in SQL. You should use aggregate functions to solve these queries. You can use views, temporary views, parameterized views, and user-defined functions.
1. Find each pair (c, n) where c is the cname of a company that pays
an average salary between 50000 and 55000 and where n is the
number of employees who work for company c.
2. Find the pid and name of each person who lacks at least 4 job
skils and who knows at least 4 persons.
3. Find the pid and name of each person who has fewer than 2 of
the combined set of job skills of persons who work for Google. By
combined set of jobskills we mean the set
{s | s is a jobskill of an employee of Google}.
4. Find the cname of each company that employs at least 3 persons
and that pays the lowest average salary among such companies.
5. Find each pair (c1, c2) of different company cnames such that,
among all companies, company c2 pays the closest average salary
compared to the average salary paid by company c1.
6. Without using set predicates, find each pid of a person who does
not know each person who (1) works for Apple and (2) who makes
less than 55000.
7. Without using set predicates, find each pairs (s1, s2) of skills such
that the set of persons with skill s1 is the same as the set of
persons with skill s2.
8. Find each pairs (s1, s2, n) of different skills s1 an s2 and such that
(1) the number of persons with skill s1 is the same as the number
of persons with skill s2 and (2) where n is the number of such
persons associated with c1 and c2.
5
9. (a) Using the GROUP BY counting method, define a function
create or replace function numberOfSkills(c text)
returns table (pid integer, salary int, numberOfSkills bigint) as
$$

$$ language sql;
that returns for a company identified by its cname, each triple
(p, s, n) where (1) p is the pid of a person who is employed
by that company, (2) s is the salary of p, and (3) n is the
number of job skills of p. (Note that a person may not have
any job skills.)
(b) Test your function for Problem 9a for the companies Apple,
Amazon, and ACM.
(c) Write the same function numberOfSkills as in Problem 9c
but this time without using the GROUP BY clause.
(d) Test your function for Problem 9c for the companies Apple,
Amazon, and ACM.
(e) Using the function numberOfSkills but without using set
predicates, write the following query: “Find each pair (c, p)
where c is the name of a company and where p is the pid of
a person who (1) works for company c, (2) makes more than
50000 and (3) has the most job skills among all the employees
who work for company c.”
6
2 Introduction to Dynamic SQL
The examples in this section introduce you to Dynamic SQL. Dynamic SQL is a powerful generalization of SQL: it permits writing
programs that generate queries, and furthermore, these queries can
subsequently be evaluated in the programs that generated them.
Intuitively, in a dynamic program, a string is constructed that represents a SQL query. When an execute statement in the dynamic
program is then applied to that string, the corresponding query is evaluated.
We will consider a simple example illustrating Dynamic SQL. (For
more information about Dynamic SQL, we refer to the PostgreSQL
manual under the topic of Dynamic SQL.)
Example 1 Assume that there are 3 unary relations
P(p : boolean)
Q(q : boolean)
R(r : boolean)
You should think of P, Q, and R as boolean variables that may be true
or false. This situation is set up in SQL as follows:
create table P (p boolean);
create table Q (q boolean);
create table R (r boolean);
insert into P values (true), (false);
insert into Q values (true), (false);
insert into R values (true), (false);
So we have the following situation:
P
t
f
Q
t
f
R
t
f
Next, we consider the set of boolean propositions (propositions, for
short) over these variables P, Q, and R. An inductive definition of these
propositions is as follows:2
2We have used parantheses around the boolean connectives ‘or’, ‘and’, and ’not’ to
remove issues related to the order of precedence for these connectives. We could have
ommitted such parantheses but then we need to impose a precedence order. Typically,
this is such that ‘not’ has higher precendence than ‘or’ and ‘and’, and ‘and’ has a higher
precedence than ‘or’.
7
• P, Q, and R are propositions.
• If E and F are propositions then (E or F) is a proposition.
• If E and F are propositions then (E and F) is a proposition.
• If E is a propositon then (not E) is a proposition.
• If E is a propositon then (E) is a proposition.
Here are some examples of propositions over P, Q, and R.
P
Q
R
(not P)
(not (P))
(P or R)
(P and P)
(P and (not (R and Q)))
We will now consider the truth table of a proposition. (For the concept
of truth table of a propositon, we refer to the document Propositional Logic
in the course module Notes on Discrete Mathematics.) For example,
the truth table for the proposition (P or R) is a follows:3
p q r truthValue
t t t t
t t f t
t f t t
t f f t
f t t t
f t f f
f f t t
f f f f
and that for (P and (not (R and Q))) is
3Notice that we have 3 propositional variables in play, i.e., P, Q, and R, and therefore,
each truth table will have 23 = 8 truth assignments. This is even the case when the
proposition does not mention some of these variables.
8
p q r truthValue
t t t f
t t f t
t f t t
t f f t
f t t f
f t f f
f f t f
f f f f
It is possible to generate the truth table of a proposition in SQL. For example, for the proposition ’(P and (not (R and Q)))’ we can write
the SQL query (let’s call it Q1)
select p, q, r, (P and (not (R and Q)))
from P, Q, R;
The problem with this approach is that to determine the truth table
for another proposition such as ’(not (Q and (not P)))’, we would
need to write a different query (let’s call it Q2)
select p, q, r, (P and (not (R and Q)))
from P, Q, R;
even though the blue parts in Q1 and Q2 are same.
A more general approach to generate the truth table of an arbitrary
proposition is to use Dynamic SQL. Here is the user-defined function
Dynamic SQL that can facilitate this:
create or replace function truthTable(proposition text)
returns table(p boolean, q boolean, r boolean, truthValue boolean) as
$$
begin
return query
execute ’select p, q, r, ’ || proposition ||
’ from P, Q, R;’;
end;
$$ language plpgsql;
To generate the truth table for the proposition
“P and (not (R and Q))”
we would call the truthTable function with the string representation
9
’(P and (not (R and Q)))’
of this proposition as follows:
select * from truthTable(’(P and (not (R and Q)))’);
The critical component of the DynamicSQL code for the function truthTable
is the statement
return query
execute ’select p, q, r, ’ || proposition ||
’ from P, Q, R;’;
What happens in this statement is that we build a string that represents an SQL query by concatenating the string
’select p, q, r, ’
with the string that is passed into the truthTable function’s proposition
parameter (in this case the string
’(P and (not (R and Q)))’)
and, finally, with the string
’ from P, Q, R;’.
This concatenation of strings is accomplished using the string concatenation
function ||. As such we get the string
’select p, q, r, (P and (not (R and Q))) from P, Q, R;’
This is a string that represents a valid SQL query which we can then
evaluate with the execute operation. The rest of the code, i.e., the
return query statement, then returns the result of this evaluation,
i.e., the truth table of the proposition (P and (not (R and Q))).
Now that we have the truthTable function, we will consider in
the next example the problem of verifying if two Propositional Logic
propositions are logically equivalent.
Example 2 Consider two Propositional Logic propositions E and F.
We say that E and F are logically equivalent if their respective truth
tables are the same.
We will write a boolean user-defined SQL function
10
logicallyEquivalent(E text, F text) returns boolean
which takes as arguments two strings that represent the propositions
E and F respectively and that returns ’true’ if E and F are logically
equivalent, and ’false’ otherwise.
The code for this function relies on the following insight. Let TE
and TF be the truth table of E and F, respectively. Then, by definition
E and F are logically equivalent, i.e., TE = TF . But the later set
condition is equivalent with the condition
(TE − TF ) = ∅ and (TF − TE) = ∅.
Here is the function:
create or replace function
logicallyEquivalent(E text, F text)
returns boolean as
$$
select not exists (select truthTable(E) except (select truthTable(F))) and
not exists (select truthTable(F) except (select truthTable(E)));
$$ language sql;
11
3 Operations on polynomials and matrices using
SQL aggregate functions
In the problems in this section, you will practice working with aggregate
functions.
A useful other aspect of solving these problems is that you will learn
how relations can be used to represent polynomials and matrices and
how SQL can be used to define operations on such objects.
10. Let P(x) and Q(x) be 2 polynomials with integer coefficients.
Let P(coefficient, degree) and Q(coefficient, degree) be
two binary relations representing P(x) and Q(x), respectively.
E.g., if P(x) = 2x
2 − 5x + 5 and Q(x) = 4x
4 + 3x
3 + x
2 − x then
their representations in the relations P and Q are as follows:
P
coefficient degree
2 2
−5 1
5 0
Q
coefficient degree
4 4
3 3
1 2
−1 1
0 0
(a) Write a dynamic SQL function
create or replace function multiplyPolynomials(polynomal1 text, polynomial p2 text)
returns table(coefficient integer, degree integer) as
$$

$$ language plpgsql;
that computes a binary relation representing the multiplication of P(x) and Q(x), i.e., the polynomial P(x) ∗ Q(x).
For example, consider P(x) = 2x
2 − 5x + 5 and Q(x) =
4x
4+3x
3+x
2−x. Then P(x)∗Q(x) = (8)x
6+(6−20)x
5+(2−
15+20)x
4+(−2−5+15)x
3+(5+5)x
2+(−5)x = 8x
6−14x
5+
7x
4+8x
3+10x
2−5x. So, for these polynomials, your function
when called with the name of the relation representing the
polynomials should return the relation
12
coefficient degree
8 6
−14 5
7 4
8 3
10 2
−5 1
0 0
(b) Your solution should work for arbitrary polynomials P(x)
and Q(x). Show how your function can be used to compute
i. the polynomial P ∗ Q;
ii. the polynomial P ∗ P; and
iii. the polynomial P ∗ (Q ∗ P).
11. (Practice problem – not graded)
Let P(x) and Q(x) be 2 polynomials with integer coefficients. We
can consider their composition P(Q(x)).
For example, consider P(x) = 2x
3 −5x+ 5 and Q(x) = 4x
4 + 3x
3
,
then
P(Q(x)) = 2(Q(x))3 − 5Q(x) + 5
= 2(4x
4 + 3x
3
)
3 − 5(4x
4 + 3x
3
) + 5
= 2(4x
4 + 3x
3
)(4x
4 + 3x
3
)(4x
4 + 3x
3
) − 5(4x
4 + 3x
3
) + 5
= 128x
12 + 288x
11 + 216x
10 + 54x
9 − 20x
4 − 15x
3 + 5
and
Q(P(x)) = 4(P(x))4 + 3(Q(x))3
= 4(2x
3 − 5x + 5)4 + 3(2x
3 − 5x + 5)
= 64x
12 − 640x
10 + 664x
8 + 2400x
8 − 4980x
7 − 1420x
6+
12450x
5 − 10400x
4 − 5925x
3 + 16125x
2 − 11125x + 2875
(a) Write a dynamic SQL function
create or replace function compositionPolynomials(polynomial1 text, polynomial2 text)
returns table(coefficient integer, degree integer) as
$$

$$ language plpgsql;
13
that computes a binary relation representing the composition
of P(Q(x)).
Your solution should work for arbitrary polynomials P(x)
and Q(x). Show how your function can be used to compute
i. the polynomial P(Q(x));
ii. the polynomial Q(P(x)); and
iii. the polynomial P(P(P(x))).
12. (Practice problem – not graded)
Let M be an n × n matrix of boolean values. (We will assume
that n ≥ 0.) We will use T to denote ‘true’ and F to denote
‘false’.
For i, j ∈ [1, n], we will denote by M[i, j] the boolean value in
matrix M at row i and column j. (Notice that when n = 0, M
has no elements.)
A boolean matrix M can be represented using a relation M with
schema
(rw: integer, colmn: integer, value: boolean)
and such that for each i, j ∈ [1, n]
(i, j, M[i, j]) ∈ M.
Notice that we use the attribute names ‘rw’ and ‘colmn’ since the
words ‘row’ and ‘column’ are reserved words in PostgreSQL.
For example if M is the 3 × 3 boolean matrix
M =
T F T
F T T
T F T
then M is the following relation of 9 tuples:
14
M
rw colmn value
1 1 T
1 2 F
1 3 T
2 1 F
2 2 T
2 3 T
3 1 T
3 2 F
3 3 T
Let M and N be two n × n boolen matrices represented by the
two relations M and N.
(a) Write a dynamic SQL function
create or replace function booleanMatrixMultiplication (M text, N text)
returns table (rw integer, colmn integer, value boolean) as
$$
begin
return query execute …;
end;
$$ language plpgsql;
that computes a relation over schema (rw, column, value)
that represents the matrix M ∗ N when given the names ‘M’
and ‘N’ of the relations that represent M and N, respectively.
Your solution should work for any n ≥ 1.
For example if M and N are by the following 3 × 3 boolean
matrices stored as the relations
15
M
rw colmn value
1 1 T
1 2 F
1 3 T
2 1 F
2 2 T
2 3 T
3 1 T
3 2 F
3 3 T
N
rw colmn value
1 1 T
1 2 T
1 3 T
2 1 F
2 2 F
2 3 F
3 1 T
3 2 F
3 3 T
then your function should produce the relational representation of M ∗ N, i.e., the relation
M ∗ N
rw colmn value
1 1 T
1 2 T
1 3 T
2 1 T
2 2 F
2 3 T
3 1 T
3 2 T
3 3 T
Consider the Person relation. Then the relation Knows, which
is a binary relation over Person, can be modeled by a boolean
matrix M Knows by using the Person pids as row and column
indices for this matrix, and by using the value T at position
(i, j) if (i, j) ∈ Knows, and the value F at position (i, j)
if (i, j) 6∈ Knows. We can then represent this boolean matrix with a relation M Knows(rw integer, colmn integer,
value) by using the following SQL statements:
create table M_Knows(rw integer, colmn integer, value boolean);
insert into M_Knows
select rw.pid, colmn.pid,
exists (select 1
from Knows k
where rw.pid = k.pid1 and colmn.pid = k.pid2)
from person rw, person colmn;
16
(b) Compute the matrix M Knows ∗ M Knows. What does the
matrix represent?
(c) M Knows ∗ (M Knows ∗ M Knows). What does the matrix represent?
17
4 Triggers (Practice problems – not graded)
To begin the problems in this section, you should first remove the entire database, including the relation schemas. You should then create
the relations but without specifying the primary and foreign key constraints. You should also not yet populate the relations with data.
Solve the following problems:
13. Develop appropriate insert and delete triggers that implement the
primary key and foreign key constraints that are specified for the
Person, Company, and Worksfor relations.
Your triggers should report appropriate error conditions.
For this problem, implement the triggers such that foreign key
constraints are maintained using the cascading delete semantics.
For a reference on cascading deletes associated with foreign keys
maintenance consult the PostgreSQL manual page
https:www.postgresql.orgdocs9.2ddl-constraints.html
Test your triggers using appropriate inserts and deletes.
14. Repeat Problem 13 but subject to the constraint that a person
may not appear in the worksFor relation if he or she has fewer
than 2 job skils.
15. Consider the following view definition
create or replace view PersonIsKnownByNumberofPersons as
(select p1.pid, p1.name,
(select count(1)
from Person p2
where (p2.pid, p1.pid) in (select k.pid1, k.pid2
from Knows k)) as NumberofKnownByPersons
from Person p1);
This view defines the set of tuples (p, n, c) where p and n are the
pid and name of a person and c is the number of persons who
know the person with pid p.
You should not create this view! Rather your task is to create
a relation PersonIsKnownByNumberofPersons that maintains a
18
materialized view in accordance with the above view definition under insert and delete operations on the Person and Knows
relation.
Your triggers should be designed to be incremental. In particular, when an insert or delete occurs, you should not always
have to reevaluate all the number of persons who know persons.
Rather the maintenance of PersonIsKnownByNumberofPersons
should only apply to the person information that is affected by
the insert or delete.
Provide tests that show that your solution works.
19