loess
stl
) to extract time series components where appropriate
You will need to install the following packages for today’s workshop:
lattice
for drawing trellis plotsinstall.packages("lattice")
A graph can be a powerful vehicle for displaying change over time. A time series is a set of quantitative values obtained at successive time points, and time series are most commonly graphed as a simple line graph with time horizontally and the variable being measured shown vertically.
Download data: economics
The economics
data set above contains monthly economic
data for the USA collected from January 1967 to January 2015. The
variables are:
date
- month of data collectionpce
- personal consumption expenditures, in billions of
dollarspop
- total population, in thousandspsavert
- personal savings rateuempmed
- median duration of unemployment, in
weeksunemploy
- number of unemployed in thousandsThe date
column is stored in R’s Date
format so we can plot it directly. However, this isn’t guaranteed of all
data sets and sometimes you will need to convert it (see the
as.Date
function for how to do so). Thankfully, everything
is all set up, so let’s plot the personal savings rate over time:
plot(x=economics$date,y=economics$psavert, xlab='Date',ylab='Personal Savings Rate', ty='l')
Time series of financial variables like these are very often quite noisy and variable. While there is obviously a general global trend, where the line between ‘trend’ and ‘noise’ is drawn is debatable and there is no single clear answer for these data. Fitting a smoother would help us identify a clear smooth trend, but using different levels of smoothing will give us quite different results.
## fit the model, note we have to convert our 'date' to numbers here
lfit <- loess(psavert~as.numeric(date), data=economics)
xs <- seq(min(as.numeric(economics$date)), max(as.numeric(economics$date)), length=200)
lpred <- predict(lfit, data.frame(date=xs), se = TRUE)
## redraw the plot
plot(x=economics$date,y=economics$psavert, xlab='Date',ylab='Personal Savings Rate', ty='l')
lines(x=xs, y=lpred$fit, col='red',lwd=4)
This extracts the overall trend quite cleanly! If we shrink the span, however, we will fit more closely to the lesser peaks in the data - but is this signal or just noise?
## reduce span to get a more localised fit
lfit2 <- loess(psavert~as.numeric(date), data=economics,span=0.1)
lpred2 <- predict(lfit2, data.frame(date=xs), se = TRUE)
## redraw the plot
plot(x=economics$date,y=economics$psavert, xlab='Date',ylab='Personal Savings Rate', ty='l')
lines(x=xs, y=lpred$fit, col='red',lwd=4)
lines(x=xs, y=lpred2$fit, col='green',lwd=4)
The variation around the trend is quite irregular here, and unless we have any other information to help us explain these features they are difficult to explain. We’ll return to this idea of breaking down a time series into different scales of detail later on.
Download data: income
The income
data set contains the mean annual income in
the USA from 1913 to 2014. The values are given for two separate groups:
the top 1% of earners (U01
), and the lower 90% of earners
(L90
). This gives us two time series to explore - the main
questions of interest is how have these time series changed relative to
each other, and who has done better over the past 100 years? (though we
can probably guess the answer)
ylim
argument to span both series. Use a different colour
for each series.income
).L90
and
U01
, and draw a scatterplot with L90
vertically against U01
horizontally. Use plot type
ty='b'
(b
for ‘both’) to connect your points
with lines.text
(type
?text
) for details.
text(x=income$U01,y=income$L90, labels=income$year, cex=0.5, pos=4)
cut
function here to split a numerical variable up into a
categorical factor.In theory, it should be possible for incomes to rise for everyone at the same time — for the gains of economic growth to be broadly distributed year after year. But the takeaway from these graphs is that since World War II, that’s never really happened in the U.S.
Download data: co2
The co2
data contains the average monthly concentrations
of CO2 from 1959 recorded at the Mauna Loa observatory in Hawaii.
Environmental time series like this one often have a very strong and
regular pattern to them.
co2
) over time.For time series with a strong seasonal component it can be useful to look at a Seasonal Decomposition of Time Series by Loess, or STL. Essentially, we define a regular time series such as this in terms of three components:
To do a time series decomposition, we’ll need to turn the data into a
ts
(time series) object so it is recognised by R.
co2ts <- ts(data = co2$co2,
start = co2[1, 1:2],
end = co2[nrow(co2), 1:2],
frequency = 12)
Then we can apply the time series decomposition
decomp <- stl(co2ts, s.window = "periodic")
and plot the results:
plot(decomp, ts.colour = 'blue')
decomp
object in a time series object called
time.series
, with a column for each component.decomp
.De-trending seasonal time series like this can be very effective, however it can be difficult to find such ideally-suited time series with such regular behaviour outside of highly-structured situations.
The nycflights13
library contains a huge amount of data
for all flights departing New York City in 2013. We’ll just focus on a
simple summary - the number of departures per day.
Download data: flights
n
) over
time (date
).loess
, and add your
smoothed trend to the plot. You may want to use the decimalised date
(decimal_date
) variable as x
in your model
here.This is one example of a well-structured and regular time series where we can apply the decomposition technique. We have regular variation according to the day of the week, and the pattern repeats every week. First, we setup the data as a time series object using that information:
flts <- ts(flights$n, frequency=7,start=0)
We can then use the stl
function to decompose the series
into its components:
decomp <- stl(flts, s.window='periodic')
decomp
object to view the decomposition. What
features do you see?decomp$time.series[,2]
against the date
variable in the original data set.abline(v=ymd('2013-12-25'),col='red')
will
draw a vertical line on Christmas Day.
Time series of human activity are often very cyclical like this, with many predictable features. Whilst perhaps not hugely surprising, part of the job of an exploratory analysis is to check whether what might think should be obvious actually is obvious in the data!