This workshop is intended for self-study after completing Lectures 1-10 and Workshops 1-6.
The goal is to use the techniques we’ve seen so far on some unseen data sets, and to introduce some more advanced visualisation techniques, suchas animation.
New techniques: * Useplot_ly
to produce interactive plots
* Use the manipulate
package to create interactive plots
You will need to install the following packages for today’s workshop:
corrplot
for drawing correlation plotsgplots
for drawing heatmaps (note this is different to
ggplot2
)aplpack
for drawing Chernoff faces (this has a
ridiculous number of dependencies, so you may want to skip it)manipulate
and plotly
for creating
interactive plotsinstall.packages(c("corrplot","gplots","aplpack","manipulate","plotly"))
Download data: gapminder
The data contain values for population, life expectancy, GDP per capita, population, fertility and infant mortality, every years, from 1900 to 2020. The variables are:
country
- the country, factor with 142 levelscontinent
- the continent, factor with 5 levelsyear
- year of observation ranges from 1900 to
2020lifeExp
- life expectancy at birth, in yearspop
- populationgdpPercap
- GDP per capita (US$,
inflation-adjusted)infantMort
- Death of children under 5 years of age per
1000 live birthsfertility
- The number of children that would be born
to each woman with prevailing age-specific fertility ratesThis data set is quite complex, as we have five time series observed for multiple countries (i.e. a categorical variable with many levels). To get a feeling for the data, let’s focus on a particular picture of the state of the world in one year - let’s take 2015.
year
of 2015
, and save this as a new
variable.continent
s? Try a variety of different techniques to
visualise your results.
If we had more time, we would explore all the variables like this. However, let’s start scrutinising the behaviour in different countries:
United Kingdom
change over time? How about the GDP? Draw both plots side-by-side.What we really want to do is get a complete picture of all the time series in all the countries! For this we’ll need to use a loop to draw the time series for each country.
library(scales)
cols <- alpha(c("#F44336", "#2196F3", "#5fd53f", "#9C27B0", "#FF9800"),0.75)
plot(x=range(gapminder$year),y=range(gapminder$lifeExp),
xlab='Year',ylab='Life Expectancy', pch=NA)
for(co in levels(gapminder$country)){
lines(x=gapminder$year[gapminder$country==co],
y=gapminder$lifeExp[gapminder$country==co],
col=cols[gapminder$continent[gapminder$country==co]])
}
legend(x='topleft',legend=levels(gapminder$continent),col=cols,lty=1)
log='y'
to the plot
command and redraw the plot - does this help?
Given we have multiple associated time series evolving at the same time, a scatterplot of one against the other would give a useful snapshot of a slice through all the time series.
inches
argument.The Gapminder video used animation to show how the bubbleplot evolved
over time. There are a variety of options to allow us to animate our
graphics. However, the techniques we have seen so far are designed only
for drawing single static images. So, we’ll need to use a different
method - thankfully the plotly
package provides great
support for animation.
To make this work, we’ll need to use plotly
’s plotting
function. Thankfully, it’s a fairly sensible function, however it will
take longer to produce a plot than usual so you may need to wait a few
seconds.
p <- plot_ly(x= gapminder$gdpPercap, # x value
y = gapminder$lifeExp, # y value
size = sqrt(gapminder$pop), # bubble sizes and min/max
color = gapminder$continent, # colours
frame = gapminder$year, # each year is a separate animation frame
text =gapminder$country, # text labels will show when hovering
type = 'scatter', # type of plot
mode = 'markers', marker = list(sizemode = 'diameter'), fill = ~'',
hoverinfo = "text")
p
The first line creates the plot ‘object’ here called p
.
We then need to evaluate that object to draw the plot by evaluating
p
at the console.
Play
button will animate the plot, moving smoothly
from one year’s data to the next.xaxis
argument to plot_ly
, or modify the
existing plot object p
with the layout
function like so:
layout(p, xaxis = list(type = "log"))
.layout
function:
layout(p, xaxis = list(title='GDP',type = "log"), yaxis=list(title='Life expectancy'))
.p
.Unfortunately, modifying the individual frames of the animation to,
e.g. add a smoothed trend, is not quite so easy. However,
plotly
can be very effective at animating a relatively
simple graphic such as this.
An alternative, albeit rather less fancy, approach is to use the
manipulate
package. While this doesn’t animate the plots,
it does offer similar slider control as plotly
and allows
us to introduce more customisation in what is graphed.
plotly
.function
called bubble
which takes a single argument year
and runs your code to
draw the bubbleplot for that year
.bubble(1945)
and
bubble(2010)
.
Given a pre-defined plotting function, we can use the
manipulate
library to add a slider
to our
plot.
manipulate(bubble(year), year=slider(1800,2020))
This will draw the default plot at 1800, and add a small cog icon in
the top left. Click this to reveal the slider. Now, when we move the
slider, it will change the value of year
, pass this into
the bubble
function and re-draw the plot. Similarly, you
can use the left/right arrows to step through the years.
Unfortunately, manipulate
doesn’t support all of the
features that plotly
did - there is no animation, labels,
or zooming in here. It is comparatively basic, but does allow us to have
a bit more control over what is drawn.
text
labels on the
points for the UK, USA, and China.loess
smoothed trend for each
continent
.
Download data: weather
The data set above comprise the Met Office’s historical weather station data for Durham, containing monthly observations from 1880 on:
tmax
)tmin
)tave
) - defined as the mean
of tmax
and tmin
rain
)Like our previous data set, we have some long time series with a potentially interesting structure. An obvious question to try and investigate would be whether there has been any long-term changes in the weather, as possible signs of the effects of climate change.
A graphical exploratory analysis might proceed as follows:aggregate
function can be used to combine our data
in this way. To take the yearly mean of the average monthly
temperatures, run the code
aggregate(weather$tave, by=list(weather$year), FUN=mean)
.
Save this to a new variable.manipulate
to make an interactive plot with a
slider for the year of the data, which draws the annual time series in
average temperature for that year.
picker
function from the manipulate package to
change which variable you’re plotting.