In this assignment, you will both visualize data and analyze data using linear regression ‘plus plus’. That is, linear regression analysis on a dataset with many predictors which makes use of subset selection or shrinkage methods, and it is tested how well the model fits the data. The assignment is to be worked on in threes or fours, for which you will receive the group division from your lab instructor. You will decide yourself which data you will use, and formulate a research question that you want to answer. Below you will find a step-by-step walk through for the assignment.
The deadline for handing in Assignment 1 is on Thursday May 27th before your lab meeting. The assignment should be emailed to your lab instructor as a zip file, which contains the following:
Find yourself a suitable data set. The dataset should be
Suitable datasets can be found for example in/on:
Explore and learn about the structure of your data by constructing visualizations. Select a minimum of 2 and maximum of 3 graphs to illustrate your data in your final report.
Based on the content of your data and the visualizations you constructed, formulate 1 research question that you will investigate using linear regression - data science style. That is, in the linear regression, make use of either best subset selection or shrinkage methods (Ridge regression or Lasso). Select the best linear model appropriately. To ensure reproducability of your findings, please make sure to use
set.seed() in your
Present your results in a R markdown file. In the R markdown file, the visualizations and analysis work that you did are presented in a logical order and are combined with a description of the data, a description of the steps you have taken, the research questions you formulated and your results and conclusions regarding the research question.
In the compiled R markdown file, show both your used R code (that is, include the R code chunks) and the output. Show all your work in the Rmarkdown file, this includes any preprocessing of the used data (e.g., steps you have taken to be able to work with the data).
Your grade will be determined by: