For Loops


Introduction


Within coding and programming for R, For Loops (or hereafter referred to as Loops) allows the repeated execution of specificed commands. This is useful if you are handling a large amount of data and wish to execute the same function, adaptation or change across all/some of your data, without having to code the process line by line, which can save you significant amounts of time in the long run!

Due to the noted complexity of Loops, it was believed producing this additional tab & tutorial, would aid your learning beyond the lectures and practicals in handling, using and engaging positively with Loops. For more information on Loops please see: Chapter 6 in A Beginner’s Guide to R, this Springer Textbook is a free downloadable resource which covers a huge variety of foundation topics and will be providing the foundation for this tutorial.



Conceptual Example


Lets start with a basic conceptual example (from R Bloggers), say for example you would like to print the phrase “The Year is 20XX” with XX being replaced by the years between 2010 and 2020. This could easily be achieved through writing ten individuals lines of code:

print(paste("The year is", 2010))
## [1] "The year is 2010"
print(paste("The year is", 2011))
## [1] "The year is 2011"
print(paste("The year is", 2012))
## [1] "The year is 2012"
print(paste("The year is", 2013))
## [1] "The year is 2013"
print(paste("The year is", 2014))
## [1] "The year is 2014"
print(paste("The year is", 2015))
## [1] "The year is 2015"
print(paste("The year is", 2016))
## [1] "The year is 2016"
print(paste("The year is", 2017))
## [1] "The year is 2017"
print(paste("The year is", 2018))
## [1] "The year is 2018"
print(paste("The year is", 2019))
## [1] "The year is 2019"
print(paste("The year is", 2020))
## [1] "The year is 2020"

Or could be written using the for() function, repeatedly executing the same command again and again.

for (year in 2010:2020){
  print(paste("The year is", year))
}
## [1] "The year is 2010"
## [1] "The year is 2011"
## [1] "The year is 2012"
## [1] "The year is 2013"
## [1] "The year is 2014"
## [1] "The year is 2015"
## [1] "The year is 2016"
## [1] "The year is 2017"
## [1] "The year is 2018"
## [1] "The year is 2019"
## [1] "The year is 2020"

This basic conceptual example, is able to demonstrate to us, that the for() function is comprised clearly of two sections.


Let us consider the first section:

for (year in 2010:2020)

This section contains three components:

  • The function: for(),
  • The parameter: in,
  • The values: year & 2010:2020

Meaning as a result, these three components can be interpreted as: for value in value. Where in this case, it can be interpreted as: for year in year array.


Let us now consider the second section:

{
  print(paste("The year is", year))
}

This section is more general, and can be any function which uses the value in the for() function itself.

In this case, we can see that here year is included.



Practical Example


Let us now consider a more practical example.

If we consider the ggplot dataset mpg. Say we would like to know how many of the cars have cylinders under/over/has a specific value. One method (although not necessarily the most straightforward) is using a loop.

for(i in mpg$cyl){
  print(i == 5)
}

Breaking this loop down, what this does is after recognising what you are looking for (the value cylinders () within the dataset mpg), it compares each observation within this dataset to the parameters set (in this case, whether the number of cylinders is exactly 5), for which it then prints TRUE or FALSE depending on the result.

If you run this code, you will be able to observe the distribution in the printed values.

However: this can be seen to present as having limited value, since as the researcher you are able to only interact with this in a limited way. As such it is possible to extend the loop to allocate these values to new values.

So, let us extend this previous example to have these outcome results be saved as a seperate dataframe.

cyl.out <- rep(NA, count(mpg))

for(i in cyl.out){
  cyl.out <- (mpg$cyl == 4)
}

For this, as you can observe the syntax itself changes, and so that for each blank unit within the empty dataframe (cyl.out), the outcome of the statement is each cars cylinder value higher than 4, should replace the NA value. This in itself is complex, however means that you can complete this repetative task with ease.