Description
This assignment is divided into two parts; each part demonstrates an application of functional
higher-order list functions. All your solutions should be defined within two modules: RelDB
and Superstring. You should submit the files Superstring.hs and RelDB.hs (or *.lhs).
File names are case sensitive.
Note: All definitions at top level must have explicit type annotations. Unless otherwise
specified, solutions may utilize functions from the standard prelude and any material in the
course notes and textbook. You may not use any other materials without explicit citation.
1 Relational Databases
Many modern implementations of relational databases (such as the family of SQL servers)
provide functionality on tables that corresponds to the higher-order list functions commonly
used in functional programming.
Figure 1: Typical relational database table structure.
column name column name column name
row 1 value value value
row 2 value value value
For this part of the assignment, you should start with the following module definition. Table
values will be modelled using an algebraic data type, and a database table (as illustrated in
Figure 1) will be defined as a list of column names and a list of rows in which each row is a
list of values. The lengths of all rows should match the number of columns.
module RelDB where
data Value = I Int | S String | Null
deriving Show –so that you can display values
type Column = String
type Row = [Value]
type Table = ([Column], [Row])
Problem 1. (10 pts)
First, you will define several operators on values. Add the following type definitions to your
module.
type Op = Value -> Value -> Value
type LogicOp = Value -> Value -> Bool
type AggregateOp = [Value] -> Value
(a) Define a function plus :: Op that computes the sum of two database values. If at
least one argument is not an integer, plus should return Null.
(b) Define a function ccat :: Op that concatenates two database values. If at least one
argument is not a string, ccat should return Null.
(c) Define a function eqls :: LogicOp that returns True if its two arguments are equivalent (even if both are Null) and False otherwise.
(d) Define a function agg :: Op -> Value -> AggregateOp that takes an operator op
and a default value base, and returns an aggregate operator that performs the operation
specified by op across an entire list of values. If the list of values supplied to the
aggregate operator is empty, it should return the default value specified by base. Below
are two examples of how this function should behave.
(agg plus (I 0)) [I 1, I 2, I 3, I 4] = I 10
(agg ccat (S “”)) [S “a”, S “b”, S “c”] = S “abc”
Problem 2. (30 pts)
You will now implement several table operations commonly supported by implementations
of relational databases.
(a) Define a function v :: Column -> [(Column, Value)] -> Value that takes a column name along with a list of values labelled by column, and returns the appropriate
value (or Null if there is no such column in the list). You may assume that the column
names in the list will be unique.
2
(b) Define a function select :: [Column] -> Table -> Table that takes a column list
and a table, and returns a new table containing only the columns in the column list.
It is possible to implement this function in one line using zip and list comprehensions.
(c) Define a function aggregate :: AggregateOp -> Column -> Table -> Value that
takes an aggregate operator, a column, and a table. It should return the result of
applying the aggregate operator over all the values in the specified column of the table.
(d) Define a function join :: Table -> Table -> Table that takes two tables and returns a table that has the columns from both tables (you may assume no column names
are duplicated between the tables). The resulting table should have a row for every
pair of rows in which one row is taken from the first table, and the other row is taken
from the second table. That is, if the first table has 푛 rows and the second table has
푚 rows, the resulting table should have 푚 ⋅ 푛 rows.
(e) Add the following data type to your module.
data OpArg = C Column | V Value
Define a function only :: OpArg -> LogicOp -> OpArg -> Table -> Table that
takes a logical operator, two arguments for that operator, and a table. It should return
a table that contains only the rows for which the operator, when applied to the values
(or column values) specified by the two arguments, returns True. The resulting table
should contain the same columns as the input table.
Problem 3. (10 pts)
Add the following two tables to your module.
tbl1 = (
[“Metro area”, “Population”], [
[S “Tokyo”, I 32450000],
[S “Seoul “, I 20550000],
[S “Mexico City”, I 20450000],
[S “New York City”, I 19750000],
[S “Mumbai”, I 19200000],
[S “Jakarta”, I 18900000],
[S “Sao Paulo”, I 18850000],
[S “Delhi”, I 18600000] ] )
tbl2 = (
[“City”, “Country”], [
[S “Tokyo”, S “Japan”],
[S “Seoul “, S “South Korea”],
[S “Mexico City”, S “Mexico”],
[S “New York City”, S “United States”],
[S “Mumbai”, S “India”],
[S “Jakarta”, S “Indonesia”],
[S “Sao Paulo”, S “Brazil”],
[S “Delhi”, S “India”] ] )
3
(a) Using only the functions you were explicitly asked to define in Problem #2, define
tbl3 :: Table so that it contains only the columns “Country” and “Population”,
specifying the population of the largest city in the corresponding country.
(b) Using only the functions you were explicitly asked to define in Problem #1 and
Problem #2, define population :: Value to be the total population of the two
most populous cities in India.
2. Shortest Superstring Problem
The shortest superstring problem is encountered in both DNA sequencing and data compression. The problem is defined in the following way: given a list of strings 푆 = {푠1, . . . , 푠푛},
what is the shortest possible string 푠
∗
such that every string 푠푖 ∈ 푆 is a substring of 푠
∗
? You
will use higher-order list functions to implement an algorithms for solving this problem.
Problem 4. (20 pts)
(a) Define a function overlap :: (String, String) -> Int that takes two strings 푠
and 푠
′ and returns the number of elements at the end of 푠 which overlap with the
beginning of 푠
′
. For example:
overlap (“fire”, “red”) = 2
overlap (“water”, “blue”) = 0
You may want to use the function isPrefixOf, found in the prelude. To use it, you
will need to add import List immediately after your module declaration.
(b) Define a function contains :: String -> String -> Bool that returns True only
if the second string is a substring of the first string. Otherwise, it returns False.
(c) Define an infix operator o :: String -> String -> String that concatenates two
strings 푠 and 푠
′ by merging the overlapping elements at the end of 푠 and the beginning
of 푠
′
. For example:
“fire” ‘o‘ “red” = “fired”
“water” ‘o‘ “blue” = “waterblue”
(d) Using foldr and (o), define a superstring algorithm naive :: [String] -> String
that takes a list of strings, and returns a superstring containing every string in that
list.
Problem 5. (30 pts)
Add the following higher-order function to your module. The function take an objective
function obj and two arguments. It returns the argument that minimizes the objective
function.
4
minimize :: (a -> Int) -> a -> a -> a
minimize obj x y = if obj x < obj y then x else y You will now define the optimal superstring algorithm. (a) Define a function allPairs :: [String] -> [(String , String)] that takes a list
of strings, and returns a list of all pairs of strings that can be made out of this list’s
elements, excluding any pairs of equal strings.
(b) Using filter, define a function update :: [String] -> (String, String) -> [String]
that takes a list of strings ℓ, and a pair of strings (푠, 푠′
). The function should combine
the two strings 푠 and 푠
′
(taking advantage of any overlapping) to obtain a new string
푠
′′. It should remove from the list of strings ℓ any strings contained in 푠
′′, and should
then return a list that contains all of the remaining strings as well as one copy of 푠
′′
.
For example:
update [“fire”, “red”, “blue”] (“fire”, “red”) = [“fired”,”blue”]
(c) Define a recursive algorithm superstring :: ([String] -> [(String, String)])
-> [String] -> String that takes a function next for generating pairs of strings out
of a list, and a list of strings ℓ. Given an empty list, superstring should return an
empty list. Given a list with only one string, it should return that string. Otherwise,
the algorithm should operate in the following manner at each invocation.
(1) Take the list of strings and generate a list of pairs using the next function.
(2) For every pair in the list from (1), generate a new list with only that pair combined
(thus progressing the superstring computation one step closer to a solution).
(3) Every list of strings generated in (2) is a smaller version of the superstring problem. The algorithm should call itself recursively to solve this smaller problem.
Remember that any recursive call is also expecting next as an argument. This
portion can be implemented using map or list comprehensions.
(4) When the recursive calls all return a result, return the shortest (you can use foldr,
length, and minimize for this; for the base case of foldr, you can use naive).
This can be defined in a single line; plan carefully and use list functions when possible.
(d) Define a superstring algorithm optimal :: [String] -> String that takes a list of
strings, tries all possible ways of combining the strings, and returns the best result.1
Because it tries all possibilities, this algorithm always returns the shortest possible
superstring, but runs in exponential time. Nevertheless, your implementation should
be able to handle the following test cases:
test1 = [“ctagcgacat”, “aagatagtta”, “gctactaaga”, “gacatattgt”, “tagttactag”]
test2 = [“101001″,”010100010100”, “100101”, “001010”, “11010”, “100”, “11010”]
test3 = [x++y | x<-[“aab”,”dcc”,”aaa”], y<-[“dcc”, “aab”]]
1Hint: You only need your solutions from parts (a) and (c) above.