Data Visualization using ggplot


Introduction

In this practical, we will learn how to visualize data, we will be using a package which implements the grammar of graphics: ggplot2. Both of these packages exist within the tidyverse.

It is advised to review both the lecture slides for this week as well as the tab labelled Basics of ggplot as this will review some fundamentals of ggplot.

Don’t forgot to open the project file 02_R_visualisation_students.Rproj. You can work from the Rmd file practical_2_worksheet.Rmd (as the Rmarkdown code used to create this specific lab html file is very extensive, you will not work from it direclty).

For this lab you will require the following packages:

library(ISLR)
library(tidyverse)

The language of graphics; ggplot

Plots within R can be created without the use of ggplot using plot(), hist() or barplot() and related functions.

For examples of how to construct graphics without ggplot expand the section below.

Graphics without ggplot

Here, we can create examples of each using the dataset Hitters, from ISLR.
Firstly, using the function head() it is possible to get an idea of what the dataset looks like:

head(Hitters)
##                   AtBat Hits HmRun Runs RBI Walks Years CAtBat CHits CHmRun
## -Andy Allanson      293   66     1   30  29    14     1    293    66      1
## -Alan Ashby         315   81     7   24  38    39    14   3449   835     69
## -Alvin Davis        479  130    18   66  72    76     3   1624   457     63
## -Andre Dawson       496  141    20   65  78    37    11   5628  1575    225
## -Andres Galarraga   321   87    10   39  42    30     2    396   101     12
## -Alfredo Griffin    594  169     4   74  51    35    11   4408  1133     19
##                   CRuns CRBI CWalks League Division PutOuts Assists Errors
## -Andy Allanson       30   29     14      A        E     446      33     20
## -Alan Ashby         321  414    375      N        W     632      43     10
## -Alvin Davis        224  266    263      A        W     880      82     14
## -Andre Dawson       828  838    354      N        E     200      11      3
## -Andres Galarraga    48   46     33      N        E     805      40      4
## -Alfredo Griffin    501  336    194      A        W     282     421     25
##                   Salary NewLeague
## -Andy Allanson        NA         A
## -Alan Ashby        475.0         N
## -Alvin Davis       480.0         A
## -Andre Dawson      500.0         N
## -Andres Galarraga   91.5         N
## -Alfredo Griffin   750.0         A

Using the function hist(), it is possible to examine the distribution of salary of different hitters.

hist(Hitters$Salary, xlab = "Salary in thousands of dollars")

Using the function barplot() it is possible to plot how many members in each league.

barplot(table(Hitters$League))

Using the function plot() it is possible to plot the number of career home runs vs the number of 1986 home runs.

plot(x = Hitters$Hits, y = Hitters$HmRun, 
     xlab = "Hits", ylab = "Home Runs")


Overall, these plots are extremely informative and useful for visually inspecting the dataset, relatively easily, as they have specific syntax associated with each of them. By contrast, ggplot has a more unified approach to plotting, where layers can be built up using the + operator.

For example, the graphs created using the function plot() can be created in ggplot with relative ease.

plot vs ggplot

homeruns_plot <- 
  ggplot(Hitters, aes(x = Hits, y = HmRun)) +
  geom_point() +
  labs(x = "Hits", y = "Home runs")

homeruns_plot

homeruns_plot + 
  geom_density_2d() +
  labs(title = "Cool density and scatter plot of baseball data") +
  theme_minimal()


As introduced within the lectures a ggplot object is built up using different layers, as seen within the examples above.

  1. Input the dataset to a ggplot() function call
  2. Construct aesthetic mappings
  3. Add (geometric) components to your plots which use these mappings
  4. Add Labels, themes and other visuals.

Due to this layered syntax it is easy to then add elements over time, as seen within the Complex Scatter plot tab, such as density lines, a title and a different theme.

Therefore in conclusion, ggplot objects are easy to manipulate and they force a principled approach to data visualization. Within this practical, we will learn how to construct them. This practical will be built upon a basic understanding of ggplot grammar, which is covered both in the lecture and can be refreshed under the Basics of ggplot tab on the course website.


Part 1. Aesthetics and Good Practice

As discussed within the lecture, having clear graphs is critical to ensure easy interpretation of the information presented. If the data is not clearly presented, correctly labelled or overly complex then the information can be either incorrectly interpreted or not understood.

Question 1: Run the following code examples, and identify which graph represents the best visualization of the data.

Note: All examples will use the dataset diamonds part of the ggplot package. Ensure to remove the # when running the code yourself.

dia.ex1 <- ggplot(data = diamonds, mapping = aes(x = price, y = carat)) +
  geom_point() + 
  labs(x = "Price in USD", y = "Carat", title = "Price of Diamond by Carat") + 
  theme_classic()

# dia.ex1

dia.ex2 <- ggplot(data = diamonds, mapping = aes(x = price, y = carat,
                                                            colour = cut)) +
  geom_point(shape=1) + 
  labs(x = "Price in USD", y = "Carat", title = "Price of Diamond by Carat") + 
  theme_minimal()

dia.ex3 <- dia.ex2 + facet_grid(cols = vars(diamonds$cut))

# dia.ex2


After discussing which graph is the best, consider which is the worst.


Question 2a: Using your knowledge and understanding of what makes a good, clear graph, improve one of the two graphs (if you improve the worst graph, change it differently than the ‘best’ graph given!).


Question 2b: Now using your knowledge and understanding, worsen one of the two graphs (if you worsen the best graph, change it differently than the ‘worse’ graph given!).


Part 2. The layers of ggplot

As previously discussed, the beauty of ggplot is its unified approach to making graphs, as such understanding the core layers of a ggplot command, enables the construction of extremely professional looking graphs which are extremely adaptive.

For example, take the graph below created for the Finanical Times, this was created by John Burn-Murdoch an analyst for the company. This graph although complex in nature, began life within ggplot before being further developed with specialist layers, from different R packages. Although within this course we will not be producing anything this complex, this demonstrates that using a foundation within ggplot allows the development of much more professional plots.

At this point, you may be wondering, what layers, should, can and need to be added to a ggplot graph in order to make this style of graphs. The next sections will cover some of the potential layers in more depth.


Question 3a: Within each of the following code examples, identify what each layer does within the graph

Using # label what each line does within the code blocks.


ggplot(data = diamonds, 
       mapping = aes(
         x = price,
         y = carat)) +
  geom_point() +
  labs(
    x = "Price in USD",
    y = "Carat of Diamond",
    title = "Price of Diamond per Carat") 

ggplot(data = diamonds, 
       mapping = aes(
         x = price,
         y = carat,
         colour = clarity,
         shape = cut)) +
  geom_point() +
  labs(
    x = "Price in USD",
    y = "Carat of Diamond",
    title = "Price of Diamond per Carat",
    colour = "Diamond Clarity", 
    shape = "Diamond Cut") +
  theme_minimal()

Run the code block, and systematically remove a singular line of code before reproducing the graph to see what has been changed or is different. Remember the following:

  • Ensure to have the + operator at the end of each line where more code follows
  • The more you interact and explore the different layers through changing parts the easier it will be.
  • If in doubt return to the base code and start again!


Question 3b: Using the previous examples as a basis, attempt to reproduce the following plot


Using the previous example(s) as a template, use the mpg dataset (part of ggplot), to plot the following graph.

The following template can be used:

ggplot(data = mpg, 
       mapping = aes(x = ??, y = ??, colour = ??, shape = ??)) + 
  geom_??() + 
  labs(
    x = ??,
    y = ??,
    title = ??,
    colour = ??, 
    shape = ??) + 
  theme_minimal()


From these examples it is clear that although trends and insights can be gained from this style of plotting. In some cases there is simply too much data or information being expressed in one plot, meaning that the more subtle trends or implications are lost. As such, faceting (as exampled earlier), can be used to break down major plots into smaller sections by specified conditions or values, making the data more easily digestible.

In practice as faceting is part of ggplot, it is possible to simply add this in to display plots as panels within a grid.


Question 4: Using your knowledge of faceting, change the following code example so that it converts it from a (complex) scatter plot to a series of faceted plots.

This question will use the msleep dataset from ggplot. Using this example plot, facet this data by the Diet of the mamals (vore). Be sure to make a personal judgement call as to whether it is most appropriate in columns (cols) or rows (rows).


ggplot(data = msleep, 
       mapping = aes(x = sleep_rem, y = awake, colour = vore, size = brainwt)) + 
        geom_point() + 
        labs(x = "Amount of REM Sleep", y = "Amount of Time Spent Awake",
             title = "Amount of REM Sleep compared to the Time Spent Awake",
             colour = "Diet", size = "Brain Weight") +
        theme_minimal()


Part 3: Types of graphs in ggplot:

So far within this practical, we have used be producing scatter plots, these although undoubtedly one of the most useful tools within an analysts toolkit, may not always be appropriate for the data at hand. During this subsection, several different types of plots will be examined, with more information on other types being found under the basics of ggplot tab.

As a component of ggplot you designate the way you would like your data to be expressed visually through adding geom layers to your dataset. In their simplest form these can simply stack onto your ggplot base allowing this to explain how the data should be displayed. Within our previous examples, this can be seen through the adding of geom_point() which indicates that data should be displayed as points. However these, like the base ggplot function can have specific mapping components allowing multiple different datasets to be expressed within one graph.

Firstly however, we can review what data is best expressed as different types of geom.


Question 5: Under each tab, examine the different graphs produced using the same information, which best displays the information present.

Note: Remember to always check the type of variables which are being plotted, this can be checked under the help tab in Rstudio.


q5a.gph1a <- ggplot(data = diamonds, 
                    mapping = aes(x = price, y = carat)) +
  geom_point() +
  labs (x = "Price in USD", y = "Carat", title = "1. Scatter Plot")

q5a.gph2a <- ggplot(data = diamonds, 
                    mapping = aes(x = price)) +
  geom_bar() +
  labs (x = "Price in USD", title = "2. Bar Graph")

q5a.gph3a <- ggplot(data = diamonds, 
                    mapping = aes(x = price, y = carat)) +
  geom_line() + 
  labs (x = "Price in USD", y = "Carat", title = "3. Line Graph")

q5a.gph4a <- ggplot(data = diamonds, 
                    mapping = aes(x = price, y = carat)) +
  geom_bin2d() + 
  labs (x = "Price in USD", y = "Carat", title = "4. Heat Map Graph")

q5a.gph1b <- ggplot(data = mpg, 
                    mapping = aes(x = class, y = hwy)) +
  geom_point() +
  labs (x = "Type of Car", y = "Highway Miles per Gallon", title = "1. Scatter Plot") + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

q5a.gph2b <- ggplot(data = mpg, 
                    mapping = aes(x = class)) +
  geom_bar() +
  labs (x = "Type of Car", title = "2. Bar Graph") + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

q5a.gph3b <- ggplot(data = mpg, 
                    mapping = aes(x = class, y = hwy)) +
  geom_line() + 
  labs (x = "Type of Car", y = "Highway Miles per Gallon", title = "3. Line Graph") + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

q5a.gph4b <- ggplot(data = mpg, 
                    mapping = aes(x = class, y = hwy)) +
  geom_bin2d() + 
  labs (x = "Type of Car", y = "Highway Miles per Gallon", title = "4. Heat Map Graph") + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

q5a.gph1c <- ggplot(data = msleep, 
                    mapping = aes(x = bodywt, y = brainwt)) +
  geom_point() +
  scale_x_log10() +
  scale_y_log10() +
  labs (x = "Body Weight", y = "Brain Weight", title = "1. Scatter Plot")

q5a.gph2c <- ggplot(data = msleep, 
                    mapping = aes(x = bodywt, y = brainwt)) +
  geom_text(label = msleep$name) +
  scale_x_log10() +
  scale_y_log10() +
  labs (x = "Body Weight", y = "Brain Weight", title = "2. Text Graph")

q5a.gph3c <- ggplot(data = msleep, 
                    mapping = aes(x = bodywt, y = brainwt)) +
  geom_line() + 
  scale_x_log10() +
  scale_y_log10() +
  labs (x = "Body Weight", y = "Brain Weight", title = "3. Line Graph")

q5a.gph4c <- ggplot(data = msleep, 
                    mapping = aes(x = bodywt, y = brainwt)) +
  geom_hex() + 
  scale_x_log10() +
  scale_y_log10() +
  labs (x = "Body Weight", y = "Brain Weight", title = "4. Heat Map Graph")



Part 4: Building a more complex graph in four steps

Although an extremely useful component of ggplot is the ability to simply layer on these plotting parameters, such as geom_point(), this can be limited when individuals are working with multiple datasets, multiple variables (such as conditions or time variables). As such a useful part of the functions, is that they themselves can specify the content to be displayed.

An example of this arises from the observation of time-series data such as stock prices, or other comparison driven information (Results of something over time). For example, we can compare the price of three stocks over the course of a 6 month period, such as Apple, Microsoft and Facebook. Although this can be done in multiple ways it can be in the following way:


Question 6: Follow the steps to plot the course of these stock prices over six months


Step Zero: Access the Data

For this example, data will be used from the website Nasdaq, a market trading site which has historical stock information for a large variety of different stocks. In this case we will examine Apple (AAPL), Microsoft (MSFT) and Facebook (FB),

aapl.stk <- read_csv(file = "data/HistoricalData_AAPI_6m.csv")
msft.stk <- read_csv(file = "data/HistoricalData_MSFT_6m.csv")
fb.stk <- read_csv(file = "data/HistoricalData_FB_6m.csv")

Step One: Evaluate & Examine the data.

As these datasets have been imported as dataframes, using the summary() function will allow you to better understand the data.

As you will see, all the data on currency values is prefixed with $ which will often cause issues with R not interpreting our data in the way in which we would like it too. As such the following code should be run.

  aapl.stk$`Close/Last` <- as.numeric(gsub("\\$", "", aapl.stk$`Close/Last`))
  msft.stk$`Close/Last` <- as.numeric(gsub("\\$", "", msft.stk$`Close/Last`))
  fb.stk$`Close/Last`   <- as.numeric(gsub("\\$", "", fb.stk$`Close/Last`))

Additionally the variable Date, is not in the correct format either, meaning this will also have to be manipulated using the following code:

  aapl.stk$Date<- as.Date(aapl.stk$Date,format='%m/%d/%Y')
  msft.stk$Date<- as.Date(msft.stk$Date,format='%m/%d/%Y')
  fb.stk$Date<- as.Date(fb.stk$Date,format='%m/%d/%Y')

Step Two: Initial Plotting

After ensuring the data is suitable for use, initial plots can be made before combining them together to observe the differences and trends within the data.

For this, it is recommended you produce individual plots for each stock using the skills we have previously discussed.

NOTE: As the variables often include non-Latin characters (/), remember to use ` around them to specify them correctly.

Step Three: Plotting together

Now that you have three separate plots, to combine them, simply use the following format:

  ggplot() + 
    geom_line(data = ??, mapping = aes(x = ??, y = ??)) + 
    geom_line(data = ??, mapping = aes(x = ??, y = ??)) + 
    geom_line(data = ??, mapping = aes(x = ??, y = ??))

As you can see from this code, you simply nest these plotting parameters inside each of the specific functions. Be warned, due to the nature of this function, data inside of the function can only be access within it, meaning you are unable to call upon from outside of the function. Additionally this also overrides any general parameters set within the ggplot base.

Step Four: Tidying up

Now that all three layers are included, further details should be added to tidy up the plot, this includes labels and a legend to detail which line is which. One of the most useful ways to identify data is through colour. In this case the parameter colour = ... can be added inside of each geom_line(aes(...)). As colours will be discussed shortly, simply use red, green and blue for the moment!

Once this is completed, to ensure people know which colour is which, add an additional line of code which details the information in relation to the colours used as such:

    scale_color_discrete(name = [name of legend], 
                         labels = c([order of information as programmed above]))


Now you are all set to be able to build some good graphs yourselve. Below, we provide information to even further extend or improve your graph using colours and colour palattes, setting the size and shapes of points, and adding labels and (semi transparant) coloured boxes to your plot. The below part is, however, optional and you can also opt to review the below part at a later point in the course. For example, when you are working on your first graded assingment for which you will also have to produce a graph.

Note that at the end of this document under Part 5, we inlcude a bonus question which allows you to earn 0.5 bonus points for assignment 1.


Part 5: Shapes, colours and other details (OPTIONAL)

Applying additional visual information to a graph, can both be positive and negative. As previously discussed too much information becomes overwhelming but achieving the correct balance allows beautiful professional looking graphics. As such colours, shapes, sizes, labels and even icons can be applied in a variety of different ways to achieve the best desired result.


Colours:

There are a large variety of different ways to include colours within a plot, as well as a huge variety of different colours to choose from. Let’s first discuss ways in which to select colours. As a whole, using colours within R whether in graphics or not, can be specificed individually (using hexcode, or simply colour labels: red) or in the form of pre-determined palattes. Pre-build palattes are incredibly useful as they are define sets of colours which are applied to the information or data used, and saves you (as the coder) from having to specify each individual colour you would like to call.

The amount of palattes available for use, is huge. Simply check out the CRAN Packages Site and using your internet search finder explore for those packages with colour/color in their title. On CRAN there is a huge amount from palattes inspired by fandom’s and popular culture like Studio Ghibli, Pokemon, Harry Potter, Game of Thrones and many, many more. However not all colour palattes are so linked with popular culture, and rather some are specifically designed for certain scientific groups Oceanography, biologists, those who are Colour Blind and scientific journals generally. Although there is a huge variety of different packages available one of the most commonly used is R-Colour-Brewer. This provides a huge variety of different options and colour schemes for graphs and other visualizations.

These like any package from CRAN, to access and explore your palatte options, you can use the following steps:

## Step 1: Install the package straight from CRAN
      install.packages("[PACKAGE]")
## Step 2: Call into your library 
      library([PACKAGE])
## Step 3: To access your palatte options 
    ## Be sure to check the packages manual, however some examples are included here:

  ## For the `ghibli` package call:
      library(ghibli)  
      ghibli_palettes
  ## For the `palettetown` package (Pokemon) call:
      library(palettetown)
      palettetown

From here within ggplot the colour can then be added as part of the ggplot function. For example:

## To add a 'Scale Continuous' object from the ghibli colour scheme to the ggplot object
  + scale_color_ghibli_c(name =..., direction =...) 

## To add a 'Scale Discrete' object from the ghibli colour scheme to the ggplot object
  + scale_color_ghibli_d(name =..., direction =...) 

With an example of these being included being seen below:

ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, colour = Petal.Length)) +
  geom_point() +
  scale_colour_ghibli_c("PonyoMedium")

ggplot(data = diamonds, mapping = aes(x = price, y = carat, colour = cut)) +
  geom_point() + 
  labs(x = "Price in USD", y = "Carat", title = "Price of Diamond by Carat") + 
  theme_classic() +
  scale_color_ghibli_d(name = "SpiritedMedium")


Question 7a: After exploring the huge amount of colour platettes available, use the provided template and change the scale colour.

Remember: this is continuous data so ensure to use a function which is continuous.


ggplot(data = airquality, mapping = aes(x = Wind, y = Solar.R, colour = Temp)) +
  geom_point() + 
  [colour function]

This however is not the only way to call in colours, as mentioned, these can also be called as hex code values. These values are a universal code to designate the colour used. These can be used in replacement of when you call singular colours (for example: red).

Singular colours are typically called for individual lines, groups or items, and are specified similar to the example regarding stock price. More information on this can be found within the basics tab.


Another use of colour, it through the defining areas to fill. One example of how this is used is during density plots. These typically examine the density distrubtion of a specific dataset. A density plot example can be seen below:

ggplot(data = diamonds, aes(x = carat)) + 
  geom_density() + 
  theme_minimal()


Question 7b: using the dataset midwest (from ggplot); produce a density plot which examines the density of a specific variable of your choosing, grouped by another variable from the dataset, using this information regarding colours for density plots.


Colour can be added to a density plot in a number of different ways including:

Through specifying a single number this sets the size to a static value.

ggplot(data = diamonds, mapping = aes(x = price, y = carat, size = 2)) + 
  geom_point() + 
  theme_minimal()

When using static shapes, these can be defined within geom_point(), using a specfic number, which corresponds to a shape within R. This figure from: STHDA illustrates the different shapes which can be applied within graph points.

ggplot(data = diamonds, mapping = aes(x = price, y = carat)) + 
  geom_point(shape = 13) + 
  theme_minimal()

Setting the size as a variable, either continuous or discrete provides additional information.

ggplot(data = diamonds, mapping = aes(x = price, y = carat, size = y)) + 
  geom_point() + 
  theme_minimal() + 
  labs(size = "Diamond Width")

Setting the shape as a discrete variable (typically less than 7 categories within ggplot) provides additional information.

ggplot(data = diamonds, mapping = aes(x = price, y = carat, shape = cut)) + 
  geom_point() + 
  theme_minimal() + 
  labs(shape = "Diamond Cut")


Shapes and Sizes

Using shapes and sizes within ggplot, can be done in much the same way as colour, through defining them within the aes() parameter. Both size and shape can either be singularly defined (uniform across all data) or variable specific. Examples of these can be seen below:


Question 8: Using the dataset msleep, plot sleep_rem against awake, including brainwt as the size and vore as the shape



Labels

Ensuring graphs are adequately labeled and scaled correctly is yet another core component in ensuring that graphs are clear. Thoughout the previous examples, labels have been frequently used to label the x-axis, y-axis as well as the titles and legends. Labeling any graph within ggplot, typically uses:

  + labs(x = ??,         # x-axis label
         y = ??,         # y-axis label
         shape = ??,     # shape legend label
         colour = ??,    # colour legend label
         title = ??,     # title label
         )  

Question 9a: Using the template provided, add in sufficient labels too all the data included

Remember to confirm which variables are used through querying diamonds in the help tab


ggplot(data = mpg, 
       mapping = aes(x = displ, y = cty, shape = drv, colour = fl)) +
    geom_point() + 
    theme_minimal()

Alongside labelling the environment of a graph, it is also possible to label specific points or an area within the environment. Returning to the topic of stock data, let us consider highlighting a specific one month period, from a sixth month period.

Let us consider three different stocks: Starbuck (SBUX), Amazon (AMZN) and Tesla (TSLA). Repeating the same steps as used earlier.

# Import the data
sbux.stk <- read_csv(file = "data/HistoricalData_SBUX_6m.csv")
amzn.stk <- read_csv(file = "data/HistoricalData_AMZN_6m.csv") 
tsla.stk <- read_csv(file = "data/HistoricalData_TSLA_6m.csv") 

# Correct the data were appropriate 
  sbux.stk$`Close/Last` <- as.numeric(gsub("\\$", "", sbux.stk$`Close/Last`))
  amzn.stk$`Close/Last` <- as.numeric(gsub("\\$", "", amzn.stk$`Close/Last`))
  tsla.stk$`Close/Last` <- as.numeric(gsub("\\$", "", tsla.stk$`Close/Last`))

  sbux.stk$Date<- as.Date(sbux.stk$Date,format='%m/%d/%Y')
  amzn.stk$Date<- as.Date(amzn.stk$Date,format='%m/%d/%Y')
  tsla.stk$Date<- as.Date(tsla.stk$Date,format='%m/%d/%Y')

# Plot the graph
ggplot() + 
  geom_line(data = sbux.stk, mapping = aes(x = `Date`, y = `Close/Last`, colour = 'red')) +
  geom_line(data = amzn.stk, mapping = aes(x = `Date`, y = `Close/Last`, colour = 'blue')) +
  geom_line(data = tsla.stk, mapping = aes(x = `Date`, y = `Close/Last`, colour = 'green')) +
  labs(x = "Date", y = "Price of Stock at Market Close ($)") + 
  scale_color_discrete(name = "Stock", labels = c("Starbucks (SBUX)", "Amazon (AMZN)", "Tesla (TSLA)"))

Furthermore, imagine we would like to specify a single month period, within the six displayed. This can done using the annotate() function within ggplot. Within this example provided let us examine between December - Janurary.

# Plot the graph
ggplot() + 
  geom_line(data = sbux.stk, mapping = aes(x = `Date`, y = `Close/Last`, colour = 'red')) +
  geom_line(data = amzn.stk, mapping = aes(x = `Date`, y = `Close/Last`, colour = 'blue')) +
  geom_line(data = tsla.stk, mapping = aes(x = `Date`, y = `Close/Last`, colour = 'green')) +
  labs(x = "Date", y = "Price of Stock at Market Close ($)") + 
  scale_color_discrete(name = "Stock", labels = c("Starbucks (SBUX)", "Amazon (AMZN)", "Tesla (TSLA)")) +
  annotate(geom = "rect", xmin = as.Date("2020-12-01"), xmax = as.Date("2021-01-01"), ymin = 0, ymax = 3500, fill = "red", alpha = 0.2) +
  annotate(geom = "text", x = as.Date("2020-12-15"), y = 1000, label = "Winter \n 2020")

Breaking down the annotate() function we can consider the following elements: - geom - typically this can be “text” to add text or “rect” to add a colour in a defined area - x & y or xmin & xmax & ymin & ymax - this is where you define your parameters - label or fill - when specify text you specify a label and fill when rect. - alpha - this is the transparency of the text or parameter.


Question 9b: Using the stock graph you produced in Question 6, using annotate highlight a period of time (for example “2019-11-01” - “2020-01-01”) and a specific data (for example “2020-14-02”).

Remember to use two seperate annotate functions one for text and the other for rect


Part 5: Review and Recap:

This lab covered a large amount of diverse content within the ggplot universe. However, this is only the beginning! The lab has provided you with a foundation knowledge of visualization which will be used across the coming labs, in addition to the Assignments. As a way to test your comprehensions of these skills, as a group you have the option to complete this final question.


Question 10: Using all of the skills, templates and materials covered, as well as your own knowledge, using one of the following datasets available within ggplot, produce your own graph expressing some of the information within it.


This final question is optional, however producing and submitting a unique and positive graph in your current group reflecting the skills and techniques learnt within this practical, will allow you to be awarded an extra 0.5 bonus points towards Assignment 1.

To be eligible for these 0.5 points towards Assignment 1, please submit a .rmd file which contains your code chunk which produces the graph and contains the name of all your group members, via email to your Lab meeting Teacher (who will provide you with feedback from your graph). You will only be awarded these additional points, if you make a constructive, clear and unique graph.

Through engaging with this question it will help you test your skills and apply the different techniques together, as well as learning the best ways to work collaboratively on R-based projects which you will be doing for the upcoming Assignments. This is also a fantastic time to let your creativity within R and ggplot run wild, and see if you can create some beautiful looking plots.

The deadline for submission is Monday May 17th at 5pm, with the best of these plots being displayed alongside their code on the website, so that as an entire group you can see the diverse nature of plots which can arise through using ggplot.


Question 10: Datasets

Diamonds

Prices of over 50,000 round cut diamonds, call diamonds to access the dataset. Contains information of over 53940 diamonds across 10 variables.

Economics

US economic time series data, call economics or econommics_long to access the dataset. This contains information over 478 months across 6 variables.

Midwest

Midwest demographic data, call midwest to access the dataset. This contains information from 437 midwest counties across 28 variables.

MPG

Fuel economy data from 1999 and 2008 for 38 popular models of car, call mpg to access the dataset. This contains information from 234 different cars over 11 variables.

Sleep

Mammals sleep dataset, call msleep to access the dataset. It contains data for 83 mammals across 11 variables.

Presidential

Presidential Term information from Eisenhower to Obama, call presidential to access the dataset. It contains data for all 11 presidents in this time, information across 4 variables.

Housing Sales

Information about housing sales made in Texas, call txhousing to access the dataset. It contains 8602 sales across 9 variables.