Sale!

STATS/CSE 780 Homework Assignment 1

$30.00 $18.00

Category: You will Instantly receive a download link for .zip solution file upon Payment || To Order Original Work Click Custom Order?

Description

5/5 - (1 vote)

You can use either R (ggplot2 and R packages for data transformation), which is covered in class,
or Python (matplotlib and Python modules for data transformation).

Statistics Canada is the national statistical office. In this assignment, you will use Statistics
Canada’s publicly available data to explore Canada’s economy, society and environment.

Visit the data site. On the left, you will see filtering options. On the right, you will see available
data related to selected options. For this assignment, you can use the data in tabular format
(Table).

On the left (filtering options)
1. Choose Province or Territory (you will need at least one quantitative and categorical variable
by provinces and territory).
2. Choose one subject.

3. Choose frequency (you will need at least ten-time points.)
On the right (table)
4. Choose the data description. The link will take you to a page displaying a dashboard.
On the data dashboard page,

5. Choose the reference time (you will need at least ten-time points).
6. At the bottom of the page, you can find “how to cite”. Keep the doi in your reference manager
to cite in your report.

7. On the top right of the dashboard, you have a button for the “Download option.’ ’
• Choose the “Download entire table” option from the pop-up window.
• Unzip the downloaded zip file.

• You can access the data and description in the unzipped folder.
If there are too many variables and samples, you can choose a subset of data after downloading to
make the following plots.

(i) Briefly describe your chosen dataset and clearly explain where it was sourced.

(ii) Clearly explain data transformation and the preprocessing methods you used to tidy the data.

(iii) Choose one (quantitative) variable for the following analysis. Then, use an appropriate visualization method to describe the trend of the variable in the selected time frequency across
provinces. Finally, clearly describe any statistical transformation used for visualization and
interpret the results.

(iv) Aggregate (aggregate over the provinces) the selected variable (from iii) for Canada and
inspect the trend over the selected time frequency using an appropriate visualization method.
Interpret the results.

(v) You can use either Shiny in R, which is covered in class, or Streamlit for the following analysis.
• Choose a categorical variable with more than two categories—product type or health status,
income status, etc.

• Use an appropriate plot to show the change of quantitative variable (from iii or any other
quantitative variable) in the selected time frequency across provinces when the user chooses
the category.

• The supplementary material must include the R or Python code you used to create the app.
• You must submit a link to your Shiny App or Streamlit community cloud. We (Instructor or
TA) must have access to the app when we grade it; otherwise, no points for the app (only for
the code if provided) are given. Describe your app in the report.

For all the questions, write a clear and concise interpretation of the plots and clearly state what
conclusions can be drawn from the plots or graphs — these conclusions should be cast in the context
of the chosen dataset.
• Plots must be readable.
• Choose an appropriate font size for plots.
• Label all aesthetics and axes in the plot.
• Use appropriate statistical transformation for plots.

Grading scheme
(i) Data description
Describe the chosen dataset (background of the dataset and the
variables) [3]
(ii) Data transformation
Did you choose all the downloaded variables and observations
or a subset? Describe the reasons for using all the data or the
subset. [2]

Pre-processing How did you identify missing values? How did you represent the
missing values in tidy data? How did you identify outliers? If
there were any outliers, how did you handle them? [4]

(iii) Plot Appropriate plot, the plot is readable, appropriate font size, label all aesthetics and axes, use appropriate statistical transformation, interpretation (how to read the plot), conclusion (any
interesting patterns) [4]

(iv) Plot Appropriate plot, the plot is readable, appropriate font size, label all aesthetics and axes, use appropriate statistical transformation, interpretation (how to read the plot), conclusion (any
interesting patterns) [4]

(v) Shiny app Link to shiny app works, the output is an appropriate plot, shiny
app reacts to the user inputs (categories), description of the app
is written in the report or the app [4]

Plot Plot is readable, appropriate font size, label all aesthetics and
axes, use appropriate statistical transformation, interpretation
(how to read one of the plots), conclusion (any interesting pattern in one of the plots) [4]

References Reference list starts on a new page, references are appropriate
and list out in the report [2]
Supplementary
material
Supplementary material starts on a new page, code readability,
all codes are within the margins, the R codes and the outputs
for the questions are presented [3]
Shiny app or
Streamlit
Shiny app codes (don’t execute the codes when you create the
PDF) or Streamlit workflow [2]

The maximum points for this assignment is 32. We will convert this to 100%.