Visualising the relationship between two continuous variables is one of the most commonly used graphical techniques in the sciences. This page details how to produce simple scatterplots to display how one continuous variable is related to another.
For this worked example, download a data set on plant heights around the world, Plant_height.csv, and import into R.
Plant_height <- read.csv(file = "Plant_height.csv", header = TRUE)
To visualise how plant height varies with temperature, we can use a simple scatterplot, using the
plot function. Place the Y variable on the left of the tilde (~) and the X variable on the right. The
data=Plant_height argument tells R to look in the data frame Plant_height for those two variables.
plot(height ~ temp, data = Plant_height)
If the two variables you would like to plot are in different objects in R, you would simply use:
plot(y ~ x)
where y and X are two vectors of equal length.
Scatterplots can be formatted using the basic R formatting in the graphics package. The code below details some of the more commonly used formatting commands for simple scatterplots. These commands can be used for any plotting function in the graphics package.
Add axis labels or titles
Axis labels are produced with the
ylab arguments. Titles are provided with the
main argument. Note that figures in scientific publications rarely have a title, but include information about the plot in a figure legend presented below the plot.
plot(height ~ temp, data = Plant_height, xlab = "Temperature (°C)", ylab = "Plant height (m)", main = "Plant height vs temperature")
Edit axis limits
Axis limits are set by the
ylim arguments, where a vector of the minimum and maximum limits is required. For example to set the Y axis to have a minimum of zero and a maximum of 80 m, and the x axis to range between -20 and 30, use:
plot(height ~ temp, data = Plant_height, xlab = "Temperature (°C)", ylab = "Plant height (m)", ylim = c(0, 80), xlim = c(-20, 30))
The choice of symbols to use in plotting is extensive and choices are accessed using the
pch argument in the graphical parameters. Type
?pch to see all the choices.
A solid circles
pch=19 are often the neatest way to display data on a scatterplot.
plot(height ~ temp, data = Plant_height, xlab = "Temperature (°C)", ylab = "Plant height (m)", ylim = c(0, 80), pch = 16)
Colour can be added to any part of the plots (axis, fonts etc.) using
col argument. There are over 600 colours that can be plotted, type
colours() for the whole range.
Here we will simply change the colour of the symbols to blue.
plot(height ~ temp, data = Plant_height, xlab = "Temperature (°C)", ylab = "Plant height (m)", pch = 16, col = "blue")
Adding a line of best fit
To further explore the relationship between two variables you can add a line of best fit. For example, to add the line of best fit from a simple linear regression, we would use the linear modelling function,
lm, to obtain the slope and intercept, and add this line to the scatterplot via the graphical parameter
See the page on linear regression for the analysis of tree height versus temperature for this data set. The dependent variable analysed was the log transformed data for tree height (loght). To plot this against temperature with the line of best fit from the linear model, we would use:
plot(loght ~ temp, data = Plant_height, xlab = "Temperature (°C)", ylab = "log(Plant height)", pch = 16) abline(lm(loght ~ temp, data = Plant_height))
?abline to get the R help for these functions.
Authors: Stephanie Brodie & Alistair Poore
Last updated: Feb 2022