Visualising the relationship between two continuous variables is one of the most commonly used graphical techniques in the sciences. This page details how to produce simple scatterplots to display how one continuous variable is related to another.
For this worked example, download a data set on plant heights around the world, Plant_height.csv, and import into R.
Plant_height <- read.csv(file = "Plant_height.csv", header = TRUE)
###Scatterplots###
To visualise how plant height varies with temperature, we can use a simple scatterplot, using the plot
function. Place the Y variable on the left of the tilde (~) and the X variable on the right. The data=Plant_height
argument tells R to look in the data frame Plant_height for those two variables.
plot(height ~ temp, data = Plant_height)
If the two variables you would like to plot are in different objects in R, you would simply use:
plot(y ~ x)
where y and X are two vectors of equal length.
Scatterplots can be formatted using the basic R formatting in the graphics package. The code below details some of the more commonly used formatting commands for simple scatterplots. These commands can be used for any plotting function in the graphics package.
Add axis labels or titles
Axis labels are produced with the xlab
and ylab
arguments. Titles are provided with the main
argument. Note that figures in scientific publications rarely have a title, but include information about the plot in a figure legend presented below the plot.
plot(height ~ temp, data = Plant_height, xlab = "Temperature (°C)", ylab = "Plant height (m)", main = "Plant height vs temperature")
Edit axis limits
Axis limits are set by the xlim
and ylim
arguments, where a vector of the minimum and maximum limits is required. For example to set the Y axis to have a minimum of zero and a maximum of 80 m, and the x axis to range between -20 and 30, use:
plot(height ~ temp, data = Plant_height, xlab = "Temperature (°C)", ylab = "Plant height (m)", ylim = c(0, 80), xlim = c(-20, 30))
Symbol style
The choice of symbols to use in plotting is extensive and choices are accessed using the pch
argument in the graphical parameters. Type ?pch
to see all the choices.
A solid circles pch=16
or pch=19
are often the neatest way to display data on a scatterplot.
plot(height ~ temp, data = Plant_height, xlab = "Temperature (°C)", ylab = "Plant height (m)", ylim = c(0, 80), pch = 16)
Adding colour
Colour can be added to any part of the plots (axis, fonts etc.) using col
argument. There are over 600 colours that can be plotted, type colours()
for the whole range.
Here we will simply change the colour of the symbols to blue.
plot(height ~ temp, data = Plant_height, xlab = "Temperature (°C)", ylab = "Plant height (m)", pch = 16, col = "blue")
Adding a line of best fit
To further explore the relationship between two variables you can add a line of best fit. For example, to add the line of best fit from a simple linear regression, we would use the linear modelling function, lm
, to obtain the slope and intercept, and add this line to the scatterplot via the graphical parameter abline
.
See the page on linear regression for the analysis of tree height versus temperature for this data set. The dependent variable analysed was the log transformed data for tree height (loght). To plot this against temperature with the line of best fit from the linear model, we would use:
plot(loght ~ temp, data = Plant_height, xlab = "Temperature (°C)", ylab = "log(Plant height)", pch = 16)
abline(lm(loght ~ temp, data = Plant_height))
Type ?plot
and ?abline
to get the R help for these functions.
Authors: Stephanie Brodie & Alistair Poore
Year: 2016
Last updated: Feb 2022