EECS-4415M assignment 1 solved

$35.00

Category: Tags: , , , You will Instantly receive a download link for .zip solution file upon Payment || To Order Original Work Click Custom Order?

Description

5/5 - (4 votes)
`

Welcome to Big Data SystemsEECS-4415, for winter term 2021. Materials, instructions, and notices for the class will accumulate at the eClass (“Moodle”) Portal, and via links to here under the EECS Website as the term progresses.

Marks accumulate on ePost.

Essentials

lecture time : 11:30–13:00 Tuesdays & Thursdays (via Zoom)
place : virtual
instructor : Parke Godfrey
office hours : Mo 17:30–18:30 & Tu 13:00–14:00

An Overview of the Online Class

This class is online, but interactive. So how things are run differ necessarily from how we might do things for the class if we were meeting in person. Online makes some things harder, as we all know; but it makes other things easier! We will be looking to take advantage of our online forum. For this to work, we shall need everyone to be engaged.

How will this class be conducted? We do meet as a class via Zoom as by our lecture schedule.

  • readings
    • Readings from the required textbook are assigned for each topic.
      • Students should read the assigned readings in advance of the associated lecture to benefit the most.
      • The textbook is very aligned with what we cover in the course, and we are careful to be consistent with the textbook’s style and terminology.
    • Articles — seminal academic papers — will be assigned as additional reading paired with some topics.
  • lectures
    • We do meet as a class via Zoom as by our lecture schedule.
    • Part of the lecture time will be conducted more in a “flipped classroom” style and are intended to be fairly interactive. Our lecture meetings are the heart of the course; it is expected that students attend.
      • Each lecture period will include some lecturing with slides.
      • Examples will be covered and hands-on walk-throughs done.
      • Problems will be pitched to students, and then solutions worked out.
      • The lecture Zoom sessions will be additionally recorded and the lecture videos posted after.
  • assignments
    • There are six assignments spaced through the course.
    • Each is directly tied to the topics presented beforehand.
  • quizzes
    • There will be four small quizzes, spaced out on every other Wednesday.
  • midterm test
    • There will be a midterm test scheduled in the middle of the term.
  • final exam
    • A final exam will be scheduled in the exam period.

See the syllabus for the details.

Materials

Additional materials will accumulate here as the course progresses.

Lecture Notes

Will be added throughout the term.

  1. Introduction
  2. Zen and the Art of Tool Maintenance
  3. MapReduce Introduction
  4. MapReduce Architecture [pdf]
    1. Data Flow & Spark
      (thanks to Jure LeskovecIntro, MapReduce & Spark, CS246: Mining Massive Data Sets)
  5. Link Analysis [pdf]
  6. Data Streams
    1. Data Streams I [pdf]
    2. Data Streams II [pdf]
  7. Analysis of Large Graphs

Readings

Read the textbook chapters listed day by day in the Schedule. (Do the reading before that day.)

The list of assigned articles will accumulate here.

  1. Gray, J. & Compton, M.
    A call to arms.
    Queue. 3(3): 30-38, 2005 April 1.
    GC2006-CallToArms.pdf

    1. What is the semi-structured data challenge?
    2. What is the idea of column store?
  2. Dean, J. & Ghemawat, S.
    MapReduce: simplified data processing on large clusters.
    Communications of the ACM.
    51(1): 107–113, 2008 January 1.
    DG2004-MapReduce.pdf

    1. What does locality mean in the context of the paper?
    2. What is the granularity of fault tolerance provided by MapReduce as introduced by the paper?
  3. Malewicz, G., Austern, M.H., Bik, A.J.C., Dehnert, J.C., Horn, I., Leiser, N., & Czajkowski G.
    Pregel: a system for large-scale graph processing.
    ACM SIGMOD International Conference on Management of Data.
    pp. 135–146, 2010 June 6.
    MAB+2010-Pregel.pdf

    1. What is the “think like a vertex” mode of programming?
    2. What does it mean that edges are not first-class citizens in the Pregel model?
  4. Brewer, Eric.
    CAP twelve years later: How the ‘rules’ have changed.
    IEEE Computer.
    45(2): 23–29, February 2012.
    Brewer2012-CAP12YearsLater.pdf

    1. What is eventual consistency?
    2. Web apps which can work offline (HTML5) favour which, availability or consistency? Explain briefly.
  5. Lewis-Kraus, Gideon.
    The Great A.I. Awakening.
    The New York Times Magazine.
    2016 December 14.
    LewisKraus2016-Awakening.pdf
    original at NYT Magazine

    1. On what grounds were neural networks considered a folly?
    2. What was the big difference between the approach in “the cat paper” and previous image-recognition networks?
  6. staff.
    Blockchains: The Great Chain of being sure about Things.
    The Economist.
    2015 October 31.
    TheEconomist2015-Blockchains.pdf
    original at The Economist

    1. How does the puzzle stage add to bitcoin’s security?
    2. Who is the originator of blockchain?
  7. Gessert, F., Wingerath, W., Friedrich, S., & Ritter, N.
    NoSQL Database Systems: A Survey and Decision Guidance.
    Computer Science-Research and Development.
    32(3–4): 353–365, 2017.
    GWFR2016-NOSQL.pdf

    1. What is an advantage of and what is a disadvantage of hash sharding?
    2. Name disadvantages of SSD compared with HDD.
  8. Castaldo, J.
    What really happened at Target Canada: The retailer’s last days.
    Maclean’s.
    2016 January 21.
    Castaldo2016-TargetCanada.pdf
    A cached copy at Facebook of the video, “How to go bankrupt the Target Canada way (in thirtheen easy steps).”

    1. What company did Target Canada go to for its inventory management? What is that company known for?
    2. Why did Target Canada think they could do the integration in two years, whereas it had taken other retail chains much longer?
  9. Ching, A., Edunov, S., Kabiljo, M., Logothetis, D., and Muthukrishnan, S.
    One trillion edges: Graph processing at facebook-scale.
    Proceedings of the VLDB Endowment.
    8(12):1804-15, 2015 August 1.
    CEK+2015-trillion_edges.pdf

    1. Is Facebook’s graph data stored in a vertex-centric way as would be assumed for input into Pregel?
    2. What did not scale with the original Giraph — that is, before Facebook’s updates to Giraph as ennumerated in the paper — for Facebook with respect to aggregators?
  10. Hill, K.
    Your Face Is Not Your Own.
    The New York Timesi Magazine.
    2021 March 18.
    Hill2021-Face.pdf (local PDF)
    original at NYT Magazine

    1. Where did Clearview AI obtain its face data from?
    2. For what applications would low accuracy be a concern?
  11. Harford, T.
    Big Data: Are we making a big mistake?
    Financial Times.
    2014 March 28.
    Harford2014-Mistake.pdf (local PDF)
    original at Financial Times

    1. What is the multiple-comparisons problem?
    2. What bias exists with Boston’s Street Bump app?
Deliverables

Assignments

  1. AnalysisTopTen
  2. MapReduceFrequency
  3. StreamScan
  4. NoSQLXQuery
  5. GraphCommunities
PG hanko