A function is a self contained piece of code which has its own local variables and performs a specified task. Think of them as “mini-scripts” that can be written separately from the main script.

Writing functions are a good way of organising your analytical methods into self contained packages and can be useful simply as a way of dividing a big problem into smaller problems. If you haven’t already started writing functions this is a good place to start.

There are two types of functions:

  1. Functions that perform a task: examples include things like print() and read.csv().
  2. Functions that calculate and return a value: examples include sqrt() and mean().

Here we will focus on this second type of function.

Why use functions?

Why is it so useful to divide your script into separate, but cooperating, functions? Why not write every script as one big “chunk” of statements?

To answer this, consider some of the functions that you have already used within R. For example,

mean()
sd()
sqrt()

These functions are already predefined within the R base package, meaning that you didn’t have to tell the computer how to compute the mean, standard deviation or square root, and because that programming job has already been done, you can simply use those functions in your own script. Imagine if every time you needed a mean you had to write the following:

 sum(x) / length(x)

Even this line of script uses two functions: the sum function and length function. If these weren’t available, you would need to use code like this every time you needed a mean.

 (x[1] + x[2] + x[3] + x[4] + x[5]) / 5 

Instead, we simply use mean without giving it two thoughts. Not only is using the mean function more informative (its easier to tell what your line of code is doing) it’s also reusable. Once a function is defined it can be used over and over again, not only within the same script but within other scripts too.

Functions are shorter and easier to read. This is surprisingly important, as a great deal of time is spent combing through code, looking for mistakes etc. They are also easier to fix and extend.

To further highlight this, we will go through an example of writing our own function to calcuate the standard error of a variable.

Calculating the standard error

We will use a sample data set containing a series of different measurements (height, weight etc) from replicated algal samples. First, download the data set, Algal_traits.csv, and load in R:

Algae <-  read.csv("Algal_traits.csv")

Let’s say we want the mean and standard error of height.

We can easily compute the mean with the mean function already in R.

mean(Algae$height)
## [1] 0.4590399

To calculate standard error,

\[SE_\bar{x}= \frac{s}{\sqrt n}\]

we need the variance and sample size, n. These are relatively easy to calculate using other base functions in R. var will calculate the variance and length gives the length of the vector and thus the sample size (n).

sqrt(var(Algae$height) / length(Algae$height))
## [1] 0.04067788

Imagine now that you wanted to calculate these same statistics on a different variable (e.g., dry weight). When faced with wanting to use this piece of code twice, we may be tempted to just copy-and-paste it to a new place, thus having two copies of the above snippet in our code. However, a much better approach is to make it into a function and call that function twice.

Basic syntax of a function

A function has the following form:

FunctionName <- function (arg1, arg2, ...){
  statements that do useful stuff 
  return (something)
}

FunctionName: Can be any valid variable name, but you should avoid using names that are used elsewhere in R. Check to see if your name is already used as a keyword by asking for the help page ?FunctionName (no 100% guarantee, but a good check).

arg1, arg2…: The arguments of the functions. You can write a function with any number of arguments of any R objects (numeric, strings, characters, data.frames, matrices, other functions).

Function body: The code between the {} is run every time the function is called. This is the code that is doing all the useful stuff.

Return value: The last line of code is the value/values to be returned.

Using this format, a function to calculate the standard error of the values in the object x would be:

StandardError <- function (x){
  SE <- sqrt(var(x) / length(x))
  return (SE)
}

Now when calculating standard error we simply use StandardError like we would any other function.

StandardError(x = Algae$height)
## [1] 0.04067788
StandardError(x = Algae$dryweight)
## [1] 0.02190001

Functions quickly increase the ease of which you can read and interpret the code. It is immediately obvious what the StandardError() line of code is doing. Less obvious would be to use the following code and have to edit it extensively everytime you wanted to calculate the standard error of a different variable.

sqrt(var(Algae$height)/length(Algae$height))

The second reason why using functions is better than simply copying code is the ability to easily fix and extend functions. Try running your StandardError() function on the strength variable.

StandardError(x = Algae$strength)
## [1] NA

Notice we get NA, this is because our function doesn’t know how to deal with missing values. Often such problems only get noticed after the code has been used for a while. If you had been copying chunks of code, this problem would be scattered through out your scripts. Because we have written a function we only have to solve this problem within the function’s body and each time it is used within your script it is automatically updated.

How you deal with missing values is highly dependent on what you are trying to calculate (see the help module on Importing data. In this case, it’s logical to remove NAs before calculating the Standard Error, but instead making this the default we will include another argument within our function, called na.rm.

Firstly the Var function has a na.rm argument already built within it (see help file ?var). If you specify na.rm=TRUE then missing values will be removed.

The length function, however, does not have this capacity. We can use the logical function, is.na, to calculate n.

n=sum(!is.na(x)) will test each value of the vector, x, to see if it is missing. If it not missing (the ! means NOT), then it returns a TRUE for that position, and by counting the values returned as TRUE with sum, we are effectively counting only values that are not missing.

StandardError <- function (x, na.rm){
  SE <- sqrt(var(x, na.rm = na.rm) / sum(!is.na(x)))
  return (SE)
}

Now, let’s try out our new function on the strength variable with missing data, alternating na.rm = T and na.rm = F.

StandardError(x = Algae$strength, na.rm = T)
## [1] 0.03846584
StandardError(x = Algae$strength, na.rm = F)
## [1] NA

 

Adding comments to your function

Before you are finished, there is one last thing to do. It is a good idea to add comments your function, as this will save you from a world of pain when you go back to fix something later on. Function comments should contain, a brief description of the function (one sentence), a list of function arguments with a description of each (including data type) and a description of the return value. Function comments should be written immediately below the function definition line.

StandardError <- function (x, na.rm){
  # Computes the sample standard error
  #
  # Args:
  #  x: Vector whose standard error is to be calculated. x must have length greater than one.
  #  y: na.rm can either be T or F. T removes missing values before calculating standard error. 
  #
  # Return:
  #  The standard error of x
  SE <- sqrt(var(x, na.rm = na.rm) / sum(!is.na(x)))
  return (SE)
}

 

Storing function scripts

Once you get into the habit of writing functions it’s a good idea to keep a separate script containing your functions together. You can do this by project (i.e., keep all the functions for each project together) or by task. However you organise your function scripts is up to you so long as you remember where they are.

I include a R_scripts folder within my project directory where I keep all my function scripts associated with that project. You can the use source to tell R where all the function scripts can be found on your computer.

source("R_scripts/myfunctions.R")

Further help

DataCamp’s tutorial on functions

Author: Keryn F Bain
Last updated:

## [1] "Thu May 12 11:57:00 2016"