EECS 484 Project 2 Querying the Fakebook Database with JDBC


Category: You will Instantly receive a download link for .zip solution file upon Payment


5/5 - (6 votes)

While Project 1 focused primarily on database design, in this project you will focus on writing SQL queries. In addition, you will embed your SQL queries into Java source code (using JDBC) to implement “Fakebook Oracle,” a standalone program that answers several queries about the Fakebook database. For this project, you will use a standardized schema that we provide to you, rather than the schema that you designed in Project 1. You will also have access to our public Fakebook dataset for testing. 1. Files Provided to You You are provided with 3 Java files: , and . We have also provided a jar file ojdbc6.jar. Place all these 4 files, including the jar file under a folder project2. In addition, we have provided you a file solution-public. txt showing sample query results. When submitting your completed project, you only need to turn in . 1) This file provides the main function for running the program. You can use it to run and test your program, but you don’t need to turn it in. Please only modify the oracleUserName and password static variables, replacing them with your own information. 1 2) DO NOT modify this file, although you are welcome to look at the contents if you are curious. This class defines the query functions (discussed below in Section 3) as abstract functions, which you must implement for Project 2. It also defines some useful data structures and provides formatted print functions for you to make use of. 3) This is a subclass of FakebookOracle , in which you must implement the query functions. You should ONLY fill in the body for each of the query functions. DO NOT make any other changes. In this project, you only need to store the results of the queries in our predefined data structures (which we have provided as member variables in the class). You don’t need to worry about output formatting. In the base class , a set of print functions have been provided for you to view the query results. The MyFakebookOracle class contains parameterized names for the tables you will need to use in your queries, and they are constructed in the class constructor as shown in the following lines of code. You should always use the corresponding variable when you are referring to a table in the SQL statement to be executed through JDBC. For example, you should use the variable cityTableNam e instead of using a constant value such as ‘PUBLIC_CITIES’ in your Java code. 2 For creating and managing the database connection, you should use the predefined object oracleConnection. 4) solution-public.txt This file contains the output query results from running our official solution implementation on the public dataset provided to you. You can make use of this to check whether or not your queries are generating the same results from the same input dataset. Note that your submission will be graded using a different input dataset, so producing correct results on the public dataset is not a guarantee that your solution is entirely correct. Make sure that your queries are designed to work more generally on any valid input dataset, not just the sample data we provide. Also, think carefully about the semantics of your queries since it may not be always possible to test them in all scenarios and you often will not have the benefit of knowing the correct answers in practice. 2. Tables For this project, your schema will consist of the following twelve tables: 1. ._USERS 2. ._FRIENDS 3. ._CITIES 3 4. ._PROGRAMS 5. ._USER_CURRENT_CITY 6. ._USER_HOMETOWN_CITY 7. ._EDUCATION 8. ._USER_EVENTS 9. ._PHOTOS 10. ._ALBUMS 11. ._TAGS To Access Public Fakebook Data: should be replaced with “PUBLIC” to access the public Fakebook data tables. The public data tables are stored in the GSI’s account name (tajik). Therefore, you should use the GSI’s account name as in order to access the public tables directly within SQL*Plus for testing your queries. For example, to access the public USERS table, you should refer to the table name as tajik.PUBLIC_USERS . In the Java files provided to you, the above table names are already pre-configured in the given code. 3. Queries (100 points) There are 10 total queries (Query 0 to Query 9). Query 0 is provided to you as an example, and you are left to implement the remaining nine. The points are as shown. The queries vary tremendously in terms of difficulty. If you get stuck on a harder query, try an easier one first and come back to the tough one later. For all of these queries, sample answers on the given data are available in the attached zip file. If the English description is ambiguous, please look at the sample answers. Also, for all of these queries, when feasible, you should try to do most of heavy-lifting to answer the query within SQL. For example, if a query requires you to present the data in sorted order, use ORDER BY in your query rather than retrieving the result and then sorting it within Java. 4 Also, the grading program we use does impose a time limit on the time it waits for a query. If a query appears to be taking too much time, you should consider rewriting it in a different way to make it faster. Nested queries are usually more expensive to run. Query 0: Find information about month of birth (0 points) This function has been implemented for you in , so that you can use it as an example. The function computes the month in which the most users were born and the month in which the fewest users were born. The names of these users are also retrieved. The sample function uses the Connection object, oracleConnection, to build a Statement object. Using the Statement object, it issues a SQL query, and retrieves a ResultSet. It iterates over the ResultSet object, and stores the necessary results in a Java object. Finally, it closes both the Statement and the ResultSet objects. Query 1: Find information about names (10 points) The next query asks you to find information about users’ names, including 1) the longest first name, 2) the shortest first name, and 3) the most common first name. If there are ties, you should include all of the matches in your result. The following code snippet illustrates the data structures that should be constructed. However, it is up to you to add your own JDBC query to answer the question correctly. 5 Query 2: Find “lonely” users (10 points) The next query asks you to find information about all users who have no friends in the network. Again, you will place your results into the provided data structures. The sample code in illustrates how to do this. Query 3: Find “world travelers” (10 points) The next query asks you to find information about all users who no longer live in their hometowns. In other words, the current_city associated with these users should NOT be the same as their hometown_city (neither should be null). You will place your result into the provided data structures. Query 4: Find information about photo tags (10 points) For this query, you should find the top n photos that have the most tagged users. You will also need to retrieve information about each of the tagged users. If there are ties (i.e. photos with the same number of tagged users), then choose the photo with smaller id first. This will be string lexicographic ordering since the data types are VARCHARs (for instance, “10” will be less than “2”). Query 5: Find users to set up on dates (15 points) For this task, you should find the top n “match pairs” according to the following criteria: (1) One of the users is female, and the other is male (Note: they do not have to be friends of the same person) (2) Their age difference is <= yearDiff (just compare the years of birth for this). (3) They are not friends with one another (4) They should be tagged together in at least one photo You should return up to n “match pairs.” If there are more than n match pairs, you should break ties as follows: (1) First choose the pairs with the largest number of shared photos (2) If there are still ties, choose the pair with the smaller user_id for the female (3)If there are still ties, choose the pair with the smaller user_id for the male 6 Query 6: Suggest friends based on mutual friends (15 points) For this part, you will suggest potential friends to a user based on mutual friends. In particular, you will find the top n pairs of users in the database who have the most common friends, but are not friends themselves. Your output will consist of a set of pairs (user1_id, user2_id). No pair should appear in the result set twice; you should always order the pairs so that user1_id < user2_id. If there are ties, you should give priority to the pair with the smaller user1_id. If there are still ties, then give priority to the pair with the smaller user2_id. Finally, please note that the _FRIENDS table only records binary friend relationships once, where user1_id is always smaller than user2_id. That is, if users 11 and 12 are friends, the pair (11,12) will appear in the _FRIENDS table, but the pair (12,11) will not appear. Query 7: Find the most popular states to hold events (10 points) Find the name of the state with the most events, as well as the number of events in that state. If there is a tie, return the names of all the tied states. Again, you will place your result in the provided data structures, as demonstrated in . Query 8: Find oldest and youngest friends (10 points) Given the user_id of a user, your task is to find the oldest and youngest friends of that user. If two friends are exactly the same age, meaning that they were born on the same day, month, and year, then you should assume that the friend with the larger user_id is older. Query 9: Find the pairs of potential siblings (10 points) A pair of users are potential siblings if they have the same last name and hometown, if they are friends, and if they are less than 10 years apart in age. While doing this, you should compute the year-wise difference and not worry about months or days. Pairs of siblings are returned with the lower user_id user first. They are ordered based on the first user_id and, in the event of a tie, the second user_id. 7 4. Compiling and running your code You are provided with an Oracle JDBC Driver (ojdbc6.jar). This driver has been tested with Java JDK 1.7 and JDK 1.8. If you are unsure which Java development environment you prefer to use, we suggest that you develop your code in Eclipse. You can do this by creating a Java Project called ‘project2’ inside Eclipse, and then Importing the 3 Java source files to the project. You should also add your JDBC driver’s JAR to Eclipse’s classpath. To do this in Eclipse, go to ‘Project Settings’ then ‘Java Build Path’, and then click on the ‘Libraries’ tab, then ‘Add External JAR’. If you prefer, you can just use an editor (e.g. vi or emacs) to develop your code. In this case, you should create a directory called ‘project2’ and put the three Java source files provided to you in this directory. To compile your code you should change to the directory that contains ‘project2’. In other words, suppose you created the directory ‘project2’ in /your/home/Private/EECS484 . cd /your/home/Private/EECS484 Then, you can compile the Java source files as follows: javac project2/ project2/ project2/ You can run your program as follows (note that you should set the class path (-cp) for your copy of ojdbc6.jar): java -Xmx64M -cp “project2/ojdbc6.jar:” project2/TestFakebookOracle Note the colon (:) after ojdbc6.jar. Connect from Campus or over a VPN If you get a timeout error from Oracle, make sure you connect from campus or use the University VPN to connect (see this page for information: It 8 is possible that the guest wireless network may not work for remote access to the database without being on the VPN. Alternatively, use UM Wireless or a CAEN machine directly for your development. 5. Testing A good strategy to write embedded SQL is to first test your SQL statements in a more interactive database client such as SQL*Plus before writing them inside Java code, especially for very complex SQL queries. You have the public dataset available to test your application. We provide you with the output from our official solution querying against the public data (available in solution-public.txt ). You can compare your output with ours to debug. During grading, we will run your code on a second (hidden) dataset. 6. Submission and Grading You only need to turn in . Please submit it by going to the online autograder: If you are working in a team, then you should join a team first and then submit. Both partners must submit individually, even for a group project. While no online autograder is currently available, we will be grading your answers to the queries using an automated script, so it is important that you adhere to the given instructions, and that your file, , works correctly with an unmodified version of . We might later adjust (reduce) the autograder’s score if we notice poor Java/SQL programming style. Here are the key elements: ● For each of these tasks, think carefully about what computation is best done in SQL, and what computation is best done in Java. Of course, you can always compute the “correct” answer by fetching all of the data from Oracle, and loading it into Java virtual memory, but this is very inefficient! Instead, most of the computation can (and should!) be done in SQL, and you should only retrieve the minimum amount of data necessary. ● Close all SQL resources cleanly. See Appendix below. 9 ● Make sure your queries are nicely formatted and readable. Explain the logic briefly in comments if the query is complicated. ● You are not being evaluated on optimizing the queries. So, no need to worry about that. However, if you find that they are taking inordinately long, then you should think about simplifying the queries. Else, they could fail the tests. ● Generally, non-nested queries are preferred over nested ones, when feasible. It may not be feasible to do so in all cases. Basically, think about simplifying the queries and use comments to explain, if needed, so that someone other than you can understand your logic. 10 Appendix: Tips on Closing Your JDBC Resources Cleanly It is important that you close your JDBC connections properly. Otherwise, you risk getting locked out of the database server. Even if you kill your Java program, it is possible that the database server thinks that the client program is still around. It may keep its end of the connection open, waiting for data. This could eventually lock you out of the database if you end up creating too many instances of open connections. Here is a real example of an unfortunate situation posted by a student: “So I’ve been having this same problem, but I locked myself over 24 hours ago and I still don’t have access. I’m not sure what to do since ITS isn’t open and if CAEN can’t help I’m skeptical anyway. I created this problem by my SQL queries were running on forever (can you even get into an infinite loop in SQL? I think I was) and when you Ctrl+C out it closes SQL improperly. I’ve learned from my mistake now, but it’s too late and I can’t get back in.” Here are a few tips to help reduce the likelihood of the above type of problem: 1. First, use SQL*Plus to debug your queries rather than using Java. Also make sure you quit your SQL*Plus sessions, otherwise you can still get locked out. It may help to design your queries on paper first and avoid too much nesting of queries. Nested queries tend to run slower as query optimizers have difficulty handling them. The above student was smart enough to do that but still ran into trouble. So, let’s look at one more thing to do (Step 2) that may help. 2. Make sure you close all Connections, Statements, and ResultSets in your code. This is tricky to do. Read this for some of the nuances: The problem is, even closing a connection can lead to an exception in rare cases. We suggest using try-with-resources statements in your Java code to automatically close any JDBC objects that you create. This is a newer feature introduced in Java SE 7 that makes closing resources easier and more reliable. The next page shows an example of how this can be done for JDBC connections. 11 // Example use of a try-with-resources statement public static void viewTable(Connection con) throws SQLException { String query = “SELECT COF_NAME, SUP_ID, PRICE, SALES, TOTAL FROM COFFEES”; try (Statement stmt = con.createStatement()) { ResultSet rs = stmt.executeQuery(query); while ( { String coffeeName = rs.getString(“COF_NAME”); int supplierID = rs.getInt(“SUP_ID”); float price = rs.getFloat(“PRICE”); int sales = rs.getInt(“SALES”); int total = rs.getInt(“TOTAL”); System.out.println(coffeeName + “, ” + supplierID + “, ” + price + “, ” + sales + “, ” + total); } } catch (SQLException e) { System.err.println(e.getMessage()); } } This code is adapted from an example at: Feel free to take the code and adapt it to your needs. The above code assumes that you only need to use a single Statement object, for example. That may not always be appropriate. For example, you may sometimes want to use multiple Statement objects to execute different fixed queries, in which case you could have one try-with-resources block for each Statement object that you create. Since a Connection object is provided to you in the Project 2 code, you do not need to worry about creating a try-with-resources block for the Connection object that you use in your query implementations. If you think about the problem, it is actually pretty hard for a database server to distinguish between a slow client at the other end and a dead client. Remember that the communication between the client and the server occurs over a network. Doing all you can within Java to close the connections in all possible situations (including when queries fail) will help the server greatly. 12