Do you find yourself cutting and pasting R code a lot?

This usually will create problems for yourself later. One principle of good coding is to try and reduce repetition to the minimum possible. There are two approaches to both make your code organized and save you work. The first one is to use functions and the second one, covered here, is to use loops.

We often want to do repetitive tasks in the environmental sciences. For example, we may like to loop through a list of files and do the same thing over and over. There are many packages in R with functions that will do all of the hard work for you (e.g. check out dplyr, tidyr and reshape2 covered here. The dplyr approach works well if your data is “tidy” and in a data frame. If your data are in many different files, then a loop may be a quicker solution.

Basic syntax of loops

The syntax of loops is relatively simple – the essential components are for(){} with the the for() part dicating how often operations within the {} part are done.

Consider the loop below. The first time we run through the loop, the value i will be equal to 1, and this value will be displayed with the print function. It will then repeat with i = 2, all the way up to i = 10, doing whatever task is within the {} each time.

for (i in 1:10) {
  print(i)
}
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6
## [1] 7
## [1] 8
## [1] 9
## [1] 10

We can change the range of numbers (1:10) to anything we like, they don’t have to be a sequence or integers, or even numbers. You can also change i to anything you like.

nums <- c(3.2, 890, 0.0001, 400)

for (bat in nums) {
  print(bat)
}
## [1] 3.2
## [1] 890
## [1] 1e-04
## [1] 400
chars <- c("a", "o", "u", "z")

for (bat in chars) {
  print(bat)
}
## [1] "a"
## [1] "o"
## [1] "u"
## [1] "z"

Of the most interest to us is changing what is within the {} or the operation we are performing on our data. We can insert anything we like in here. Here is a loop that will print the square and the square root of the numbers 1 to 10.

for (i in 1:10) {
  print(i^2)
  print(sqrt(i))
}

Often we will want to keep the results that we get back from our loop. The first option is to make a blank vector or data frame and append the results to it. This takes longer to run, but doesn’t really matter with simple loops, but can increase your wait times for longer and more complicated loop structures.

Here is code that will store the square of the numbers 1 to 10 in a new vector called x

x <- vector()         # makes a blank vector

for (i in 1:10) {
  y <- i^2            # performs an operation 
  x <- append(x,y)    # overwrites 'x' with y appended to it
}

Here is code that will store both the square and the square root of the numbers 1 to 10 in two columns of a new data frame called x2

x2 <- data.frame(col1=vector(), col2=vector())   # makes a blank data frame with two columns

for (i in 1:10) {
  col1 <- i^2                # performs first operation
  col2 <- sqrt(i)            # performs second operation 
  x2 <- rbind(x2, cbind(col1, col2))  # overwrites 'x2' values including the new row
}

The second option is to make a blank vector or dataframe of known dimensions and then place the results into it directly. For example, if we had a loop with 10 elements, we could store the results of each operation in a vector with a length of 10

x <- vector(length=10)  # makes a blank vector with a length of 10

for (i in 1:10) {
  y <- i^2             
  x[i] <- y             # places the output in position i in the vector x
}

Alternatively, store the results of multiple operations in a new data frame.

x2 <- data.frame(col1=vector(length=10),col2=vector(length=10)) # makes a blank data frame with two columns and 10 rows

for (i in 1:10) {
  col1 <- i^2                   # performs first operation
  col2 <- sqrt(i)               # performs second operation 
  x2[i,1] <- col1               # places the first result into row i, column 1
  x2[i,2] <- col2               # places the second result into row i, column 2
}

 

An ecological example

 

Now we can use your new loop skills in an ecological context. As with the Subsetting data tutorial, we will use a dataset where bats were sampled across regrowth forest in south-eastern Australia which has been thinned to reduce the density of trees.

Bats <- read.csv(file="Bats_data.csv", header=T, stringsAsFactors=F)
str(Bats)
## 'data.frame':    173 obs. of  10 variables:
##  $ Site                 : chr  "CC02A1" "CC02A1" "CC02A1" "CC02A2" ...
##  $ Activity             : int  299 276 530 356 571 631 144 124 220 468 ...
##  $ Foraging             : int  0 6 14 5 3 17 3 0 7 8 ...
##  $ Date                 : chr  "9/01/2013" "8/01/2013" "7/01/2013" "8/01/2013" ...
##  $ Treatment.thinned    : chr  "medium-term" "medium-term" "medium-term" "medium-term" ...
##  $ Area.thinned         : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Time.since.thinned   : int  7 7 7 7 7 7 7 7 7 7 ...
##  $ Exclusion.thinned    : num  25.1 25.1 25.1 25.1 25.1 ...
##  $ Distance.murray.water: num  190 190 190 216 216 ...
##  $ Distance.creek.water : num  444 444 444 885 885 ...

Having a look at the structure of this data, we have two response variables: activity (no. of bat calls recorded in a night) and foraging (no. of bat feeding calls recorded in a night). These data were collected over a total of 173 survey nights and at 47 different sites. There are eight potential predictor variables in the dataframe, one of which is a factor (Treatment.thinned), and seven of which are continuous variables (Area.thinned, Time.since.thinned, Exclusion.thinned, Distance.murray.water, Distance.creek.water, Mean.T, Mean.H).

Let’s say we are exploring our data and we would like to know how well bat activity correlates with our continuous covariates. We’d like to calculate Pearson’s correlation coefficient for activity and each of the covariates separately. Pearson’s correlation coefficient, calculated with the function cor, ranges from -1 (perfect negative correlation) to 1 (perfect positive correlation) with 0 being no correlation. We will store all our correlations in a new data frame called Correlations.

First, use select from dplyr to make a subset of the data with the response variable (activity) and the 5 predictor variables.

library(dplyr)

Bats_subset <- select(Bats, Activity, Area.thinned:Distance.creek.water)

Next, make an empty data frame with two columns (the name of the variable and the correlation) and the number of rows needed to store all the correlations.

rows <- ncol(Bats_subset)-1                    # the number of rows needed in our output dataframe

Correlations <- data.frame(variable=character(length=rows), 
                 correlation=numeric(length=rows), 
                 stringsAsFactors=F)       

Finally, we can use a loop to calculate each of the correlations and store the output in our new dataframe.

for (i in 1:rows) {
  temp1 <- colnames(Bats_subset[i+1])       # retrieves the name of predictor variable
  temp2 <- cor(Bats_subset[,1], Bats_subset[,i+1], method="pearson")
                                           # calculates the correlation between activity and predictor variable
  Correlations[i,1] <- temp1               # places the variable name into row i, column 1
  Correlations[i,2] <- temp2               # places the correlation into row i, column 2
}
##                variable correlation
## 1          Area.thinned -0.40890389
## 2    Time.since.thinned -0.02135752
## 3     Exclusion.thinned  0.17562438
## 4 Distance.murray.water -0.18071570
## 5  Distance.creek.water -0.09130258

Now we can see at a glance that activity is most strongly (negatively) correlated to area thinned and that it is not at all correlated to time since thinned or mean temperature. We might then like to further investigate some of these relationships with appropriate statistical models and tests.

Further help

DataCamp’s tutorial on loops

You can find some more good examples of loops, lists and if/else statements on the BEES R User group GitHub site loops and lists by Mitch.

Author: Rachel V. Blakey

Last updated:

## [1] "Tue Sep 06 11:13:36 2016"