CptS 415 Assignment #2

$30.00

Category: Tags: , , , , You will Instantly receive a download link for .zip solution file upon Payment || To Order Original Work Click Custom Order?

Description

5/5 - (3 votes)

1. [XML and RDF] (40)
(a) (10) Consider the database instance you gave in Assignment 1 Question 2 (a). Assume now that
you don’t have any schema. Give an XML document to represent the tuples as the fact about the
airports.
(b) (10) Consider the relational schemas you gave in Assignment 1 Question 2 (b). Give an XML
schema representation of each relational schema. How do you encode keys? Foreign keys?
(c) (20) Consider a set of natural language sentences collected from Webpages.
i. A human can like another human.
ii. A human can have a sex property of a man or a woman.
iii. A man can be the father of anotherhuman.
iv. A woman can be the mother of another human.
v. A human can be married to another human.
vi. A human can have a BirthYear property of type“xs:Year”.
vii. If a human is married to another, then they like eachother.
viii. If a human is a mother or father, the human is a parent.
Write a RDF schema and give a graphical presentation to describe these relationships.
2. [Graph Algorithm] (30) The following questions test your understanding on basic graph algorithms
a. (15) Given a directed graph �(�, �, �) with � the node set, � the edge set and � a function
that assigns to each edge � ∈ � a label �(�). A label constrained reachability query �(�,�, �)
tests if there exists a path from a source node � to a target node �, which consists of edges
having a label from a label set �. Give an algorithm (pseudo-code) to answer query �.
(hint: A straightforward way is to revise BFS or DFS traversal)
b. (15) Consider a network �(�, �) of servers, where each edge � = (�, �) represents a
communication channel from a server � to another server �. Each edge has an associated
value �(�, �), which is a constant in [0,1]. The value represents the reliability of the channel,
i.e., the probability that the channel from server � to server � will not fail. Assume that these
probabilities are independent. Give an algorithm (pseudo-code) to find the most reliable path
between two given servers. Give the complexity (in Big O notation) of your algorithm.
(hint: transform the weight to non-negative numbers and the problem may become very
familiar to you).
3. [Approximate Query Processing] (25) This question continues our discussion on using data
synopsis for query processing based on data-driven approximation. You are given a vector of
numbers: [127, 71, 87, 31, 59, 3, 43, 99, 100, 42, 0, 58, 30, 88, 72, 130], each data point records the
frequency of communication of a server in a 5-minute interval. For example, in the first 5 minutes,
127 contacts are observed. In the next 5 minutes which is time interval [5, 10], 71 contacts, …
a. Give the Haar decomposition and draw a corresponding error tree for the contacts data vector.
b. Give the process and result for reconstructing the frequency during time interval [15, 20] using
Haar decomposition (explain the path in a top-down fashion).
c. Use Haar decomposition and error tree to compute the total number of communications
between time interval [15, 30] (explain the path in a top-down fashion).