Description
Goal
7 The project is to build a tool at web client to collect information regarding a web server. The
8 purpose of this project is two fold:
9 • to provide students with hands-on experience with socket programming in Python,
10 • to help students understand the application-layer protocols HTTP/HTTPs. Note that HTTPs
11 is not a standalone protocol, but instead it is HTTP over Transport Layer Security (TLS). In
12 this assignment, your main focus is on HTTP, not TLS.
13 2 Background
14 2.1 HTTP
15 HTTP stands for Hyper Text Transfer Protocol and is used for communication among web servers.
16 The web client initiates a conversation by opening a connection to a web server. Once a con17 nection is set up, the client sends up an HTTP request. The server sends an HTTP response back
18 to the client and closes the connection. An HTTP request consists of two parts: a header and a
19 body. Whether a body follows a header or not is specified in the header.
20 Using single-line header of HTTP request as an example, the first line of any request header
21 should be:
22 • the method field: The method field can take on several different values, including GET, POST,
23 HEAD, and so on.
24 • the URL field: It is the field to identify a network resource, e.g., “http://www.csc.uvic.ca/index.html”.
25 • the HTTP version field: This field is “HTTP/1.0”.
26 An example is: “GET http://www.csc.uvic.ca/index.php HTTP/1.0 ”. The request header and body
27 are separated by two sets of a carriage return and a linefeed. Since we do not need the body, the
28 end of a header marks the end of a request. Using a C char string as an example, the request above
29 should be: “GET http://www.csc.uvic.ca/index.php HTTP/1.0\r\n\r\n”. Please be careful when
30 you use Python String.
1
31 The response from a server also has two parts: a header and a body. The first line of a header
32 should be:
33 • the HTTP version field,
34 • the status code field,
35 • the phrase field.
36 Two main status codes include 200 and 404. The status code 200 means that the request
37 succeeded and the information is returned in the response. The status code 404 means that the
38 requested document does not exist on this server. Two example response messages are: “HTTP/1.0
39 404 Not Found\r\n\r\n” and “HTTP/1.0 200 OK\r\n\r\n data data data …” Another two status
40 codes 505: “HTTP Version Not Supported”, and 302: “302 found” for URL redirection are also
41 useful for this assignment.
42 2.2 URI
43 URI stands for Uniform Resource Identifier and is also known as the combination of Uniform
44 Resource Locators (URL) and Uniform Resource Names (URN). It is a formatted string which
45 identifies a network resource. It generally has the format: protocol://host[:port]/filepath. When a
46 port is not specified, the default HTTP port number is 80, and the default HTTPS port number is
47 443.
48 2.3 Cookies
49 An HTTP cookie is a small piece of data that a server sends to the user’s web browser. The browser
50 may store it and send it back with the next request to the same server. Typically, it’s used to tell
51 if two requests came from the same browser keeping a user logged-in, for example. It remembers
52 stateful information for the stateless HTTP protocol. Cookies have many applications in web, such
53 as tracking, authentication, and web analytics. Due to this reason, cookies also cause many concerns
54 on security and privacy breach.
55 The textbook includes simple introduction on cookies. More detailed information could be
56 found at: https://developer.mozilla.org/en-US/docs/Web/HTTP/Cookies. Python includes dedi57 cated modules to handle Cookies: https://docs.python.org/3/library/http.cookies.html. Neverthe58 less, you are no allowed to use this package because it defeats the purpose of a network-course
59 assignment.
60 3 Project Description
61 You are required to build a smart web client tool, called SmartClient, in Python. Note that for
62 consistence, program in other language will not be accepted!
63 Given the URL of a web server, your SmartClient needs to find out the following information
64 regarding the web server:
65 • 1. whether or not the web server supports HTTPs,
66 • 2. the newest HTTP version that the web server supports (http1.0, http1.1, or http2.0),
2
67 • 3. the name, the key, and the domain name of cookies that the web server will use.
68 Your program first accepts URI from stdin and parses it. Then it connects to a server, sends an
69 HTTP request, and receives an HTTP response. You should also implement a routine that prints
70 out the response from the server, marking the header and the body. When you finish the client, you
71 can try to connect to any HTTP server. For instance, type “https://www.uvic.ca/” as the input
72 to the client program and see what response you get.
73 As an example output, after you run your code with
74 % python SmartClient.py www.uvic.ca
75 Your SmartClient may output the received response from the server (optional), e.g.,
76 —Request begin—
77 GET http://www.uvic.ca/index.html HTTP/1.1
78 Host: www.uvic.ca
79 Connection: Keep-Alive
80
81 —Request end—
82 HTTP request sent, awaiting response…
83
84 —Response header —
85 HTTP/1.1 200 OK
86 Date: Tue, 02 Jan 2018 22:42:27 GMT
87 Expires: Thu, 19 Nov 1981 08:52:00 GMT
88 Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
89 Pragma: no-cache
90 Set-Cookie: SESSID_UV_128004=VD3vOJhqL3YUbmaZSTJre1; path=/; domain=www.uvic.ca
91 Set-Cookie: uvic_bar=deleted; expires=Thu, 01-Jan-1970 00:00:01 GMT; Max-Age=0; path=/; domain=.uvic.ca
92 Keep-Alive: timeout=5, max=100
93 Connection: close
94 Content-Type: text/html; charset=UTF-8
95 Set-Cookie: www_def=2548525198.20480.0000; path=/
96 Set-Cookie: TS01a564a5=0183e07534a2511a2dcd274bee873845d67a2c07b7074587c948f80a42c427b1f7ea4350c7dd9d0f62b80be2540f457c66$
97 Set-Cookie: TS01c8da3c=0183e075346a73ab4544c7b9ba9d7fa022c07af441fc6214c4960d6a9d0db2896a8ce601d1175378ef21beeecddb063823$
98 Set-Cookie: TS014bf86f=0183e075347c174a4754aeb42d669781e0fafb1f43d3eb2783b1354159a9ad8d81f7be7ae01ef9552c2828eda86fed940f$
99
100 — Response body —
101 Body Body …. (the actual content)
102
103 Note that some lines in above output were truncated.
104 Your code might need to send multiple requests in order to find out the required information.
105 Your code should output the final results (mandatory), for example:
106 website: www.uvic.ca
107 1. Support of HTTPS: yes
108 2. The newest HTTP versions that the web server supports: HTTP/1.1
3
109 3. List of Cookies:
110 name: -, key: SESSID_UV_128004, domain name: www.uvic.ca
111 name: -, key: www_def, domain name: www.uvic.ca
112 name: -, key: TS01a564a5, domain name: www.uvic.ca
113 name: -, key: TS01c8da3c, domain name: www.uvic.ca
114 name: -, key: TS014bf86f, domain name: uvic.ca
115 name: Google Analytics, key: __utma, domain name: uvic.ca
116 name: Google Analytics, key: __utmb, domain name: uvic.ca
117 name: Google Analytics, key: __utmc, domain name: uvic.ca
118 name: Google Analytics, key: __utmz, domain name: uvic.ca
119 name: TradeDesk, key: TDID, domain name: .adsrvr.org
120 3.1 Other Notes
121 1. Regarding other printout: Anything not specified in Assignment 1 is optional. For example,
122 you can decide whether or not to print out the IP address, port number, and so on. When
123 TAs test your code, if your code works fine without any problem, you are fine even if you
124 do not print out anything not required in Assignment 1. Nevertheless, if your code does not
125 work, TAs will not spend time to figure out what is wrong and you get a zero mark on the
126 required function (Refer to the table in Section 5 of Assignment 1). In this case, if your code
127 includes some printout to show intermediate results, TAs will have an idea on how far you
128 have achieved and give you some partial mark based on their own judgement.
129 2. Regarding readme file. Readme file is important. Without it TAs will not know how to
130 compile your code and how to run your code. It would waste our time to deal with your
131 complaint if TAs cannot run your code and give you a zero.
132 3. For more information on HTTP, HTML, URI, etc., please refer to http://www.w3.org. It is
133 the home page of W3 Consortium and you will find many useful links to subjects related to
134 the World Wide Web.
135 4 Schedule
136 In order to help you finish this programming assignment successfully, the schedule of this assignment
137 has been synchronized with both the lectures and the lab tutorials. Before the final deadline, there
138 are three lab sessions arranged during the course of this assignment. A schedule is listed as follows:
Session Tutorial Milestones
Lab 1 lab environment, P1 spec go-through, design hints, python design and code skeleton
Lab 2 socket programming and testing, DNS alpha code done
Lab 3 socket programming and last-minute help beta code done and demo
139 5 Deliveries and Marking Scheme
140 For your final submission of each assignment you are required to submit your source code to connex.
141 You should include a readme file to tell TA how to compile and run your code. At the last lab
4
142 session that you attend, you need to demo your assignment to TAs. Nevertheless, before the final
143 due date, you can still make changes on your code and submit a change.txt file to connex to describe
144 the changes after your demo.
145 The marking scheme is as follows:
Components Weight
Error handling 10
Correct output for “support of HTTPS” 10
Correct output for “the newest HTTP version that the web server supports” 30
List of Cookies 40
Code style 5
Readme.txt and change.txt(if any) 5
Total Weight 100
146
147 Important Note: listing cookies is a very tricky business, and it is possible that you will not get
148 a unique, static answer due to the dynamic changes in cookies, some created dynamically based on
149 users’ interactive input. This assignment requirement, while not being ideal for a third-year course,
150 is intended to push your limit for doing research yourself. As a benchmark, we will use some online
151 tool, http://www.cookie-checker.com/, to validate your result. You will get full mark on the “list
152 of cookies (40%)”, as long as your results have over 75% similarity with the benchmark results.
153 6 Plagiarism
154 This assignment is to be done individually. You are encouraged to discuss the design of your solution
155 with your classmates, but each person must implement their own assignment.
156 The End
5