Features to look for:
The key thing to remember is that the observations of a time series are not independent! So, looking at one-variable summaries can be misleading – the most important feature is how things behave over time. For instance, consider this data on the population of lynxes over time. If we ignore the time aspect, then the data look like this But by explicitly including time in the visualisation, we radically change what we can learn. We can see there are cyclical changes in the populations, with regular peaks and troughs in the population numbers. All of this is lost if we forget about the dependence on time.
To plot data using connected lines, we can use the same
plot
function as for scatterplots, but add the
ty
argument with value l
for
l
ine. Alternatively, if the time series is relatively
short, we can use b
instead to show b
oth lines
and points. Another option is s
for s
teps.
Note also that when drawing time series we prefer to join our points by lines. As these quantities evolve over time, it is natural to connect the points to indicate the transition from time point to time point. Usually, we default to connecting points with straight lines, but smoothers or other curves can also be used.
Time series of financial variables are very often noisy and highly variable. The plot below shows the US Personal Savings Rate over time.
load('economics.Rda')
plot(x=economics$date, y=economics$psavert, ty='l', xlab='Date', ylab='Personal Savings Rate')
There are signs of a global trend to these data, which slowly evolves over time. However, separating the ‘trend’ from ‘noise’ is difficult. Additionally, there is no clear regular patterns to observe like with the lynx data. One approach to unpacking the behaviour of time series data is to apply smoothing techniques with different bandwidths.
A long bandwidth smoother could identify a trend such as the red line below. A smoother with a narrower bandwidth could detect some of the smaller variations around that trend, such as the green line. However there is plenty of residual variation that is still unexplained!
## create some times to predict the curve at
xs <- seq(min(as.numeric(economics$date)), max(as.numeric(economics$date)), length=200)
## fit the model, note we have to convert our 'date' to numbers here
lfit1 <- loess(psavert~as.numeric(date), data=economics)
## predict
lpred1 <- predict(lfit1, data.frame(date=xs), se = TRUE)
## same again, with smaller span
lfit2 <- loess(psavert~as.numeric(date), data=economics, span=0.1)
lpred2 <- predict(lfit2, data.frame(date=xs), se = TRUE)
## draw the plot
plot(x=economics$date, y=economics$psavert, xlab='Date',ylab='Personal Savings Rate', ty='l')
lines(x=xs, y=lpred1$fit, col='red',lwd=4)
lines(x=xs, y=lpred2$fit, col='green',lwd=4)
It is often useful to think of a time series as being made up of multiple components:
The time series is then thought of has a sum of these terms: \[Y_t=T_t+S_t+\epsilon_t.\]
Sometimes, time series also incorporate an ‘irregular’ component to represent non-seasonal departures from the trend.
Our example above could have a trend described by the red line. The green line doesn’t have a regular periodic behaviour - i.e. a regular pattern that repeats - so this would probably be the irregular component. The remaining noise of the data as indicated by the deviations of the black lines from the green would then constitute the residuals.
The figure below shows the unadjusted quartery (West) German unemployment rates from 1961 to just after the unification of the two Germanies.
library(AER)
## Loading required package: car
## Loading required package: carData
##
## Attaching package: 'carData'
## The following object is masked _by_ '.GlobalEnv':
##
## UN
##
## Attaching package: 'car'
## The following object is masked from 'package:psych':
##
## logit
## Loading required package: lmtest
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
## Loading required package: sandwich
## Loading required package: survival
data(GermanUnemployment)
ts <- seq(1962,1991.75,by=0.25)
geu <- data.frame(GermanUnemployment)$unadjusted
plot(geu ~ts,ty='l',xlab='Time',ylab='Unadjusted Unemployment')
Unemployment was very low in the 60s and 70s, apart from a short spell in 1967. There were distinct jumps in unemployment in the mid 70s and early 80s due to the oil crises of 1973 and 1979. Unemployment was declining at the end of the series from the high levels in the 1980s. The series shows strong but regular variation around the trend corresponding to higher levels in winter than in summer.
The smoothing methods we’ve seen are particularly effective in extracting a smooth trend from the data:
We can then take our trend and subtract it from the time series. What’s left over will be the seasonal component plus the residuals!
res <- geu - lpred$fit
plot(x=ts, y=res,col=3,lwd=2,ty='l',xlab='Time',ylab='Residual')
abline(h=0)
The seasonal pattern here is regular, with peaks and troughs occuring at regular intervals.
We can extract the time series components by hand by doing
loess
smooths of the data with different bandwidths.
However, R can do most of this for us:
geu <- ts(GermanUnemployment[,1], ## Need to turn the data to a time series object
frequency=4) # Four observations per year defines the frequency of the regular pattern
decomp <- stl(geu, s.window = "periodic") ## pass `periodic` to seek a periodic seasonal component
plot(decomp)
The results of the decomposition are shown in the various panels of the plot. The top panel gives the original data, the second panel shows the periodic seasonal component, the third panel shows the general trend, and the bottom panel shows the residuals leftover at the end.
Depending on what time serieswe’re showing, there are a number of things to consider:
Florence Nightingale famously used data and visualisations to highlight the poor conditions of soldiers in field hospitals during the Crimean War (1854-6). While certainly more well-known as a nurse, she was also a statistician and the first female member of the Royal Statistical Society. She used data visualisations to highlight the causes of mortality of soliders in field hospitals during the Crimean War (1854-6).
While this is not a plot we would recognise today, it is a time series represented in polar form like a pie chart. The colours represent the different sources of mortality, and the segments represent sequential observations. Unwrapping this as a more conventional plot gives the more readable form:
Plotting the annualised monthly death rates from disease, wounds, and other causes makes her case clearly. The death rate from diseases (black) due to poor conditions far outstrips the deaths due to injury sustained in combat (red). The most dangerous conflicts occured at the end of 1854, but even then the number of lives lost was dwarfed by deaths from disease.
The simplest way to compare time series is to show them simultaneously on the same plot.
As an example, William Playfair graphed England’s trade to the East Indies in the 18th century. Let’s revisit this plot:
A more modern version would look something like this:
library(GDAdata)
##
## Attaching package: 'GDAdata'
## The following object is masked _by_ '.GlobalEnv':
##
## uniranks
data(EastIndiesTrade)
plot(x=EastIndiesTrade$Year,y=EastIndiesTrade$Imports, col=2, xlab='Year',
ylab='Exports (blue) and Imports (red)', lwd=4,ty='l',ylim=c(0,2000))
lines(x=EastIndiesTrade$Year,y=EastIndiesTrade$Exports,col=4,lwd=4)
library(scales)
##
## Attaching package: 'scales'
## The following objects are masked from 'package:psych':
##
## alpha, rescale
polygon(x=c(EastIndiesTrade$Year,rev(EastIndiesTrade$Year)),
y=c(EastIndiesTrade$Imports,rev(EastIndiesTrade$Exports)),
col=scales::alpha('green',0.25),border=NA)
As the time series were recorded for the same time points, we can directly calculate differences, and show the trade deficit. The line \(y=0\) is significant as it indicates whether we’re in deficit or surplus, so we can add that for reference.
plot(x=EastIndiesTrade$Year,y=(EastIndiesTrade$Exports-EastIndiesTrade$Imports), col='green2', xlab='Year',
ylab='Exports - Imports', lwd=4,ty='l')
abline(h=0,col=2)
Many of the features can be associated with major events of the time: the War of Spanish Succession, 1701-14; the South Sea Bubble, 1720; the Seven Year’s War, 1756-63; and the American Revolutionary War, 1775-83. However, it would be rather harder to identify these from the original.
Exploratory Data Analysis should always be our first look at the data, with the goals to
We have focussed on using graphical methods to try and answer questions with the data (GDA). When performing GDA:
“The simple graph has brought more information to the data analyst’s mind than any other device.” – John Tukey