Speaker: Nicholas J. Cox, Durham University |
Ado-files flagged in green have not been published in the STB. Some have been previously posted on Statalist.
Seasonality refers to more or less systematic variation with time of year. It typically accompanies other kinds of variation in time, including trend, periodic variation on longer or shorter time scales, and irregular or stochastic variations. It may be of major direct interest, or a nuisance to be set on one side. The underlying mechanisms may be well understood, or highly mysterious. On average, seasonal effects may have a relatively simple structure, easily approximated by a smooth curve, or such effects may be much more complicated: there may, for example, be spikes associated with special days or festivals. Examples come from climatology, economics, medicine, and many other sciences. Many seasonal patterns are driven at least partly by weather (or climate) (ice cream sales are affected by temperature), but many are not: consider the effects of Christmas, Easter, Independence Day(s), Ramadan, Thanksgiving, and other cultural, political and religious events, some fixed, some variable in timing.
This presentation focuses on graphical methods for showing seasonal data, or the seasonal components of such data. In several cases, the ideas could be adopted either directly or with minimal change to variation with (e.g.) time of day, but most concrete details and examples will refer to time of year. More generally, some of the ados have wider application.
An essential feature of seasonal data is that time of year is a circular scale. Clearly, January follows December just as February follows January. Cutting the scale according to any conventional (e.g. Western) calendar may make it difficult to appreciate the complete seasonal cycle. With other kinds of circular data, such as geographical features that have a compass bearing (East, SW), a circular graph format is often used for this reason. However, this does not seem so useful for seasonal data. This may be because (a) seasonality is usually just one aspect of behavior over time, and workers on time series use graphs with time on a linear (horizontal) scale, and (b) people are not familiar with circular calendars (or 24 hour clocks), in contrast to (say) map or compass representations, which are highly familiar to many scientists. In any case, there are advantages in a linear format, with response on the y axis and time of year on the x axis, in which response is easily decoded and any horizontal line is easily interpreted as a reference constant.
Most basically,
. graph response timeofyr , c(L) sy( )
plots values such that each year is represented by a single connected line. c(L) specifies that values are to be connected only if timeofyr is increasing (strictly, not decreasing).
It is often useful to copy the early part of each year after the end and the later part of each year at the beginning of each year. pextend is a utility that creates the extra observations, after which graph can be used. In practice, 1.5 cycles seem to work well.
lamonmon and ladaymons are utilities that produce conventional month labels (such as J or Jan for January) for months or days of the year.
January is not always the best starting point for the time axis. For example, when looking at rainfall in a Northern hemisphere climate with dry summer and wet winter, starting in July avoids cutting the wet season awkwardly. grotate is an ado for egen. Type
. egen newvar = rotate(oldvar), start(#)
to rotate a month variable to start at #.
For time series generally, movsumm generates summaries for overlapping windows. These summaries can be anything produced by generate, such as mean, major quantiles, variance, skewness or kurtosis. The calculation may be weighted within the window, and — of special importance for seasonal problems — the calculation may be wrapped around from end to beginning, treating the series as circular.
Other terminology for this plot: cycle plot, cycle-subseries plot, month plot, seasonal-by-month plot.
sssplot plots all values for each `month´ together. `Month´ is here suggestive, not mandatory: the ado applies whenever certain periods (e.g. years) are divided into a fixed number of shorter periods (e.g. months, quarters).
Using the start( ) option, the plot can be started at any `month´: that is, the `year´ can be rotated.
Using the sf( ) option, summaries can be plotted for each `month´, such as the mean or median.
The cyclical character of seasonal behavior can be investigated by folding the time axis around some midpoint and plotting the response as usual. This is implemented in foldplot.
Alternatively, if there are two variables varying with time of year, then their joint trajectory can be shown on a ordinary scatter plot. loopplot adds a stylistic flourish, namely, each loop is closed; that is, the end and beginning values are joined.
For variables that are strictly totals for shorter periods, some purists prefer touching bars, not point symbols. This can be done with barplot.