This post documents my experience in the Tidy Time Series and Forecasting in R workshop at rstudio::conf 2020 that took place in San Francisco, CA during January 27-30, 2020.

Following what was presented at the two-day workshop, this third post (part two of day two) will be “everything that is forecasting”. The second day focused on how to use the the tidyverts packages in forecasting. Due to the length of the material covered in the second day, it has been split into two posts. The first part covered modeling algorithms. This post will finish up modeling algorithms and cover model diagnostics and forecast accuracy.

Setup

The following packages are required:

#> Tidyverse
library(dplyr)
library(tidyr)
library(purrr)
library(lubridate)
library(stringr)

#> Tidyverts
library(tsibble)
library(tsibbledata)
library(feasts)
library(fable)

#> Plotting
library(ggplot2)

#> Printing
library(kableExtra)

The following data sets are used in this post.

Monthly retail turnover for department stores in Victoria, Australia
Yearly economic indicator values for Australia

Dept. Store
Economic

victoria_dept_stores_tsbl <- aus_retail %>% 
  filter(
    Industry    == "Department stores",
    State       == "Victoria",
    year(Month) >= 2000
  ) %>% 
  select(Month, Turnover)

victoria_dept_stores_tsbl

# A tsibble: 228 x 2 [1M]
      Month Turnover
      <mth>    <dbl>
 1 2000 Jan     228 
 2 2000 Feb     196.
 3 2000 Mar     224.
 4 2000 Apr     261.
 5 2000 May     265.
 6 2000 Jun     299.
 7 2000 Jul     206.
 8 2000 Aug     240.
 9 2000 Sep     251.
10 2000 Oct     270.
# … with 218 more rows

aus_economy_tsbl <- global_economy %>% 
  filter(Country == "Australia") %>% 
  select(Year, Growth, CPI, Imports, Exports) %>% 
  drop_na() %>% 
  as_tsibble(index = Year)

aus_economy_tsbl

# A tsibble: 57 x 5 [1Y]
    Year Growth   CPI Imports Exports
   <dbl>  <dbl> <dbl>   <dbl>   <dbl>
 1  1961   2.49  8.14    15.0    12.4
 2  1962   1.30  8.12    12.6    13.9
 3  1963   6.21  8.17    13.8    13.0
 4  1964   6.98  8.40    13.8    14.9
 5  1965   5.98  8.69    15.3    13.2
 6  1966   2.38  8.98    15.1    12.9
 7  1967   6.30  9.29    13.9    12.9
 8  1968   5.10  9.52    14.5    12.3
 9  1969   7.04  9.83    13.3    12.0
10  1970   7.17 10.2     13.2    13.0
# … with 47 more rows

Complex Methods (continued)

Picking up from the previous post, the final complex method covered on day two is detailed below.

Dynamic Regression (RegARIMA)

The last algorithm introduced at the workshop was Dynamic Regression, or a linear regression with ARIMA errors. In regular linear regression, we do not want the residuals to be correlated. In Dynamic Regression, we allow the residuals to be correlated, and then fit those residuals to an ARIMA process. This is an extension of the ARIMA() function where we add explanatory variables.

For these examples, I will switch the Australian economy data set. This data set has the annual percentage growth in GDP for Australia (we will be trying to predict this), and some other information, such as CPI, and Imports and Exports as a percentage of GDP.

aus_economy_tsbl %>% 
  head(5) %>% 
  mutate(across(where(is.numeric), round, digits = 2)) %>% 
  kable() %>% 
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Year	Growth	CPI	Imports	Exports
1961	2.49	8.14	15.03	12.40
1962	1.30	8.12	12.63	13.94
1963	6.21	8.17	13.83	13.01
1964	6.98	8.40	13.76	14.94
1965	5.98	8.69	15.27	13.22

The percentage growth in GDP for Australia from 1961 to 2017 looks like this:

aus_economy_tsbl %>% 
  autoplot(Growth) +
  geom_hline(aes(yintercept = 0), linetype = "dashed") +
  theme_minimal() +
  scale_y_continuous(
    labels = scales::label_percent(scale = 1, accuracy = 0.1)
  ) +
  expand_limits(y = c(-8, 8)) +
  labs(
    title    = "Annual Percentage Growth in GDP For Australia",
    subtitle = "Sourced from tsibbledata package",
    caption  = "Original Source: The World Bank",
    x        = "",
    y        = "Growth in GDP"
  )

As with basic ARIMA models, you can either specify the p, d, and q parameters or allow the function to select them for you.

aus_economy_dyn_reg_mbl <- aus_economy_tsbl %>% 
  filter(Year <= 2010) %>% 
  model(
    regarima_110  = ARIMA(Growth ~ CPI + Imports + Exports + pdq(1,1,0)),
    regarima_010  = ARIMA(Growth ~ CPI + Imports + Exports + pdq(0,1,0)),
    regarima_002  = ARIMA(Growth ~ CPI + Imports + Exports + pdq(0,0,2)),
    regarima_auto = ARIMA(Growth ~ CPI + Imports + Exports)
  )

glimpse(aus_economy_dyn_reg_mbl)

Rows: 1
Columns: 4
$ regarima_110  <model> [LM w/ ARIMA(1,1,0) errors]
$ regarima_010  <model> [LM w/ ARIMA(0,1,0) errors]
$ regarima_002  <model> [LM w/ ARIMA(0,0,2) errors]
$ regarima_auto <model> [LM w/ ARIMA(0,0,0) errors]

Note that I have held out 2011 to 2017. The held out years will be used to forecast because we need the extraneous variables to have future values (since they are variables in the model and not what is being forecasted). We can see this by using the report() function.

aus_economy_dyn_reg_mbl %>% 
  select(regarima_auto) %>% 
  report()

Series: Growth 
Model: LM w/ ARIMA(0,0,0) errors 

Coefficients:
          CPI  Imports  Exports
      -0.0465   0.3810  -0.0500
s.e.   0.0109   0.1914   0.1998

sigma^2 estimated as 3.11:  log likelihood=-97.76
AIC=203.53   AICc=204.42   BIC=211.18

The tidy() function again works here.

aus_economy_dyn_reg_mbl %>% 
  select(regarima_auto) %>% 
  tidy() %>% 
  mutate(across(where(is.numeric), round, digits = 2)) %>% 
  kable() %>% 
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

.model	term	estimate	std.error	statistic	p.value
regarima_auto	CPI	-0.05	0.01	-4.28	0.00
regarima_auto	Imports	0.38	0.19	1.99	0.05
regarima_auto	Exports	-0.05	0.20	-0.25	0.80

We can now use the held data to forecast the Growth variable from 2011 to 2017. Note that including the full data set in the autoplot() function will plot the actual Growth over the forecast horizon. This is useful to visually see how well the forecast performed before we get to the formal performance metrics.

aus_economy_tsbl %>% 
  filter(Year > 2010) %>% 
  forecast(object = aus_economy_dyn_reg_mbl) %>% 
  autoplot(aus_economy_tsbl) +
  geom_hline(aes(yintercept = 0), linetype = "dashed") +
  facet_wrap(~ .model, ncol = 1, scales = "free_y") +
  theme_bw() +
  scale_y_continuous(
    labels = scales::label_percent(scale = 1, accuracy = 0.1)
  ) +
  labs(
    title = str_glue(
      "Seven-Year Forecast of Annual Percentage Growth in GDP For Australia"
    ),
    subtitle = str_glue(
      "Models: four combinations of Growth = CPI + Imports + Exports with 
       ARIMA(1,1,0), ARIMA(0,1,0), ARIMA(0,0,2) and ARIMA(0,0,0) errors"
    ),
    x = "",
    y = "Growth in GDP"
  )

Model Ensemble

Having been introduced to seven time-series model types (four baseline and three more complex), the process of ensembling was then introduced. We can train multiple models, and then take some weighted average of their outputs as the forecast. The fable package is smart enough to combine forecasts using only basic arithmetic, and the forecast() function is smart enough to interpret the ensemble. The ensemble model will forecast for all the models, combine them, and adjust the final forecast appropriately.

victoria_dept_stores_ensemble_mbl <- victoria_dept_stores_tsbl %>% 
  model(
    snaive = SNAIVE(Turnover),
    ets    = ETS(Turnover),
    arima  = ARIMA(Turnover)
  ) %>% 
  mutate(mixed = (snaive + ets + arima) / 3)

victoria_dept_stores_ensemble_mbl

# A mable: 1 x 4
    snaive           ets                     arima         mixed
   <model>       <model>                   <model>       <model>
1 <SNAIVE> <ETS(M,Ad,M)> <ARIMA(2,1,1)(1,1,2)[12]> <COMBINATION>

The mixed model is of type <COMBINATION> indicating that it has combined the three models. In this case, it has assigned each model an equal weight, but you can choose a different weighting system. By using the report() function, we can see the individual models that make up the ensemble and how they are added together. Note that as of the publication date of this post, the tidy() function cannot be used on an object of class "model_combination".

victoria_dept_stores_ensemble_mbl %>% 
  select(mixed) %>% 
  report()

Series: Turnover 
Model: COMBINATION 
Combination: (Turnover + Turnover + Turnover) * 0.333333333333333

=================================================================

Series: Turnover + Turnover + Turnover 
Model: COMBINATION 
Combination: Turnover + Turnover + Turnover

===========================================

Series: Turnover + Turnover 
Model: COMBINATION 
Combination: Turnover + Turnover

================================

Series: Turnover 
Model: SNAIVE 

sigma^2: 292.2446 

Series: Turnover 
Model: ETS(M,Ad,M) 
  Smoothing parameters:
    alpha = 0.167336 
    beta  = 0.0015252 
    gamma = 0.000111099 
    phi   = 0.9799949 

  Initial states:
     l[0]     b[0]     s[0]    s[-1]     s[-2]     s[-3]     s[-4]     s[-5]
 261.8962 2.122653 1.856338 1.102949 0.9556113 0.8697593 0.8442324 0.9481133
     s[-6]     s[-7]     s[-8]     s[-9]    s[-10]   s[-11]
 0.9952447 0.9489134 0.9525782 0.9034134 0.7260604 0.896787

  sigma^2:  0.0017

     AIC     AICc      BIC 
2459.671 2462.943 2521.399 


Series: Turnover 
Model: ARIMA(2,1,1)(1,1,2)[12] 

Coefficients:
          ar1      ar2      ma1     sar1    sma1     sma2
      -0.2336  -0.1432  -0.7237  -0.8059  0.1900  -0.5586
s.e.   0.0939   0.0852   0.0712   0.2862  0.2785   0.1707

sigma^2 estimated as 193.8:  log likelihood=-872.18
AIC=1758.35   AICc=1758.89   BIC=1781.94

By themselves, the three underlying models produce the following forecasts:

victoria_dept_stores_ensemble_mbl %>% 
  select(-mixed) %>% 
  forecast(h = "3 years") %>% 
  autoplot(victoria_dept_stores_tsbl) +
  facet_wrap(~ .model, ncol = 1) +
  expand_limits(y = 0) +
  theme_bw() +
  scale_y_continuous(
    labels = scales::label_dollar(accuracy = 0.1, scale = 1e-3, suffix = "B")
  ) +
  labs(
    title    = "Three-Year Forecast of Yearly Retail Turnover for Victoria",
    subtitle = "Models: Seasonal Naive, ETS(M,Ad,M), ARIMA(2,1,1)(1,1,2)[12]",
    x        = "",
    y        = "Australian Dollars"
  )

Taking a simple average of the three models, the ensemble produces the following forecast:

victoria_dept_stores_ensemble_mbl %>% 
  select(mixed) %>% 
  forecast(h = "3 years") %>% 
  autoplot(victoria_dept_stores_tsbl) +
  theme_minimal() +
  expand_limits(y = 0) +
  scale_y_continuous(
    labels = scales::label_dollar(accuracy = 0.1, scale = 1e-3, suffix = "B")
  ) +
  labs(
    title    = "Three-Year Forecast of Yearly Retail Turnover for Victoria",
    subtitle = str_glue("Ensemble model made of equal weights of: 
                        Seasonal Naive, ETS(M,Ad,M), ARIMA(2,1,1)(1,1,2)[12]"),
    x        = "",
    y        = "Australian Dollars"
  )

Model Diagnostics

Building a model that produces a reasonable looking forecast is good, but there are steps that must be taken to validate that model. In my case, all the models I build in my current role must be validated by our risk and compliance teams. It is important that I not only prove that my model is statistically correct, I must also justify the process I used to select the model.

Information Criteria

A straightforward way to justify why a certain model was chosen over another is to use various information criteria, such as AIC, BIC, or AICc. While these measures are certainly convenient, there are some very important restrictions to know.

Comparisons can only be done on models of the same type (e.g. you cannot compare the AIC of an ETS model to that of an ARIMA model)
Comparisons for ARIMA models can only be done between those models with the same number of differences (e.g. you cannot compare the AIC of an ARIMA model with d=1 to that of an ARIMA model with d=0)

If the models in your mable() do not fall into the above restrictions, their information criteria can be calculated using the glance() function.

victoria_dept_stores_tsbl %>% 
  model(
    arima_111 = ARIMA(Turnover ~ pdq(1,1,1) + PDQ(0,0,0)),
    arima_212 = ARIMA(Turnover ~ pdq(2,1,2) + PDQ(0,0,0)),
    arima_012 = ARIMA(Turnover ~ pdq(0,1,2) + PDQ(0,0,0)),
    arima_200 = ARIMA(Turnover ~ pdq(2,1,0) + PDQ(0,0,0))
  ) %>% 
  glance() %>% 
  select(.model, AIC, BIC, AICc) %>% 
  mutate(across(where(is.numeric), round, digits = 0)) %>% 
  kable() %>% 
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

.model	AIC	BIC	AICc
arima_111	2736	2746	2736
arima_212	2685	2706	2686
arima_012	2732	2743	2732
arima_200	2800	2810	2800

Residual Diagnostics

We know how to compare models from the same modeling algorithm, but what about across different algorithms? We will discuss checking the accuracy of the forecasts on out-of-sample data in the next section, but a good first step is to check the residuals. For a time-series model, we basically have two key assumptions that must be validated:

The residuals are uncorrelated
The residuals have mean zero

While not necessary, meeting the following assumptions are also very useful:

The residuals have constant variance
The residuals are normally distributed

Essentially, meeting all four of these assumptions means your residuals are Gaussian White Noise.

The first thing we can do to check these assumptions is to simply plot the distribution of the residuals. The residuals from a forecast can be calculated using the augment() function. I will use the ARIMA models for these examples.

victoria_dept_stores_arima_mbl <- victoria_dept_stores_tsbl %>% 
  model(
    arima_211_000 = ARIMA(Turnover ~ pdq(2,1,1) + PDQ(0,0,0)),
    arima_012_002 = ARIMA(Turnover ~ pdq(0,1,2) + PDQ(0,0,2)),
    arima_200_010 = ARIMA(Turnover ~ pdq(2,0,0) + PDQ(0,1,0)),
    arima_auto    = ARIMA(Turnover)
  )

victoria_dept_stores_arima_mbl %>% 
  select(arima_auto) %>% 
  augment() %>% 
  ggplot() +
  geom_histogram(aes(x = .resid), color = "white") +
  theme_minimal() +
  labs(
    title    = str_glue("Yearly Retail Turnover for Victoria: 
                        Distribution of Residuals"),
    subtitle = "Model: ARIMA(2,1,1)(1,1,2)[12]",
    x        = "Residual",
    y        = "Count"
  )

Basic time-series are also helpful to see if there are any dramatic outliers and at what observation they occurred.

victoria_dept_stores_arima_mbl %>% 
  select(arima_auto) %>% 
  augment() %>% 
  autoplot(.resid) +
  theme_minimal() +
  labs(
    title    = str_glue("Yearly Retail Turnover for Victoria: 
                        Time-Series Residuals"),
    subtitle = "Model: ARIMA(2,1,1)(1,1,2)[12]",
    x        = "Month",
    y        = "Residual"
  )

We can get a little more quantitatively statistical, rather than purely visual, by plotting the ACF of the residuals. We can see the amount of autocorrelation, if any exists.

victoria_dept_stores_arima_mbl %>% 
  select(arima_auto) %>% 
  augment() %>% 
  ACF(.resid) %>% 
  autoplot() +
  theme_minimal() +
  labs(
    title    = "Yearly Retail Turnover for Victoria:  ACF Plot",
    subtitle = "Model: ARIMA(2,1,1)(1,1,2)[12]",
    x        = "Lag (Month)",
    y        = "ACF"
  )

Based on the plots we have seen so far, the residuals look roughly normal, and the ACF shows no autocorrelation. To formalize these observations, we can run a Ljung-Box Test. The null hypothesis is that the residual series is not autocorrelated. We want to fail to reject the null (i.e. have a p-value greater than 0.05). We can pass the ljung_box() function into the features() function to run this test. There are two parameters of the test we must define:

Degrees of Freedom (dof): the number of parameters in the model (e.g. zero for Naive, p + q for ARIMA)
Number of Lags Tested (lag): Dr. Hyndman suggests a value of 10 for non-seasonal data and 2 times the seasonal period for seasonal data (e.g. if monthly, 2 * 12 = 24)

victoria_dept_stores_arima_mbl %>% 
  select(arima_auto) %>% 
  augment() %>% 
  features(.resid, ljung_box, dof = 2+1+1+2, lag = 2*12) %>%
  mutate(across(where(is.numeric), round, digits = 2)) %>% 
  kable() %>% 
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

.model	lb_stat	lb_pvalue
arima_auto	11.53	0.87

Forecast Accuracy

The final step we can take is to forecast on out-of-sample data and check how accurate that forecast is. In general, what we want to do is:

Separate some portion of the most recent data before fitting a model
Train the model on the remaining data
Calculate some accuracy metric using the training data
Calculate some accuracy metric using the held-out data
Compare the two accuracy metrics (we want them to be similar, showing that the model generalizes well to data it has not seen before)

Whereas the information criteria can only be used to compare models of the same type, comparing accuracy can be done across model types. Commonly used accuracy measures include:

Mean Absolute Percentage Error (MAPE)
Root Mean Square Error (RMSE)
Mean Absolute Scaled Error (MASE)

The accuracy on the training data can be calculated by passing the model straight into the accuracy() function.

victoria_dept_stores_train_arima_mbl <- victoria_dept_stores_tsbl %>% 
  filter(year(Month) <= 2015) %>% 
  model(arima = ARIMA(Turnover))

victoria_dept_stores_train_arima_mbl %>% 
  accuracy() %>% 
  select(.model, .type, MAPE, RMSE, MASE) %>% 
  mutate(across(where(is.numeric), round, digits = 2)) %>% 
  kable() %>% 
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

.model	.type	MAPE	RMSE	MASE
arima	Training	2.98	13.44	0.68

The accuracy on the held out data takes an additional step. First, use the forecast() function to forecast over an arbitrary horizon, and then pass that fable() into the accuracy() function, along with the training data.

victoria_dept_stores_tsbl %>% 
  filter(year(Month) > 2015) %>% 
  forecast(object = victoria_dept_stores_train_arima_mbl) %>% 
  accuracy(victoria_dept_stores_tsbl) %>% 
  select(.model, .type, MAPE, RMSE, MASE) %>% 
  mutate(across(where(is.numeric), round, digits = 2)) %>%  
  kable() %>% 
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

.model	.type	MAPE	RMSE	MASE
arima	Test	3.93	16.93	0.94

Recap

The second day of the workshop focused on forecasting methods, model algorithms, validating those models, and producing and checking the accuracy of forecasts. This second post on the second day covered:

Building complex models
Validating model fit
Determining forecast accuracy

More Information

A big thanks to the Dr. Hyndman and and the tidyverts team for building these packages and putting together a fantastic workshop!

The best source of additional information on the tidyverts ecosystem is the official website.

For more examples and explanations on the theoretical background of time-series forecasting, please see the 3rd Edition of Forecasting Practices and Principles, co-authored by Dr. Rob Hyndman. This book goes into depth on most of the topics covered in the workshop. During the workshop, Dr. Hyndman mentioned that a new edition of the book would be coming out with updated tidyverts integration, but did not have a specific date available.

Any issues with the packages in the tidyverts can be opened and tracked via their respective GitHub repositories.

Finally, the RStudio Community is a great place to ask for and receive help. There are several tidyverts questions already posted, and at least one of the package authors is known to be an active member.

Notes

This post is based on a presentation that was given on the date listed. It may be updated from time to time to fix errors, detail new functions, and/or remove deprecated functions so the packages and R version will likely be newer than what was available at the time.

The R session information used for this post:

sessionInfo()

R version 4.2.1 (2022-06-23)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS 14.1.1

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
 [1] kableExtra_1.3.4  ggplot2_3.4.0     fable_0.3.2       feasts_0.3.0     
 [5] fabletools_0.3.2  tsibbledata_0.4.1 tsibble_1.1.3     stringr_1.4.1    
 [9] lubridate_1.9.0   timechange_0.1.1  purrr_0.3.5       tidyr_1.2.1      
[13] dplyr_1.0.10     

loaded via a namespace (and not attached):
 [1] tidyselect_1.2.0     xfun_0.40            urca_1.3-3          
 [4] lattice_0.20-45      colorspace_2.0-3     vctrs_0.6.3         
 [7] generics_0.1.3       viridisLite_0.4.1    htmltools_0.5.3     
[10] yaml_2.3.5           utf8_1.2.2           rlang_1.1.1         
[13] pillar_1.8.1         glue_1.6.2           withr_2.5.0         
[16] rappdirs_0.3.3       distributional_0.3.1 lifecycle_1.0.3     
[19] progressr_0.11.0     munsell_0.5.0        anytime_0.3.9       
[22] gtable_0.3.1         rvest_1.0.3          evaluate_0.16       
[25] labeling_0.4.2       knitr_1.40           fastmap_1.1.0       
[28] fansi_1.0.3          highr_0.9            Rcpp_1.0.9          
[31] renv_0.16.0          scales_1.2.1         webshot_0.5.4       
[34] jsonlite_1.8.0       systemfonts_1.0.4    farver_2.1.1        
[37] digest_0.6.29        stringi_1.7.12       grid_4.2.1          
[40] cli_3.6.1            tools_4.2.1          magrittr_2.0.3      
[43] tibble_3.1.8         pkgconfig_2.0.3      ellipsis_0.3.2      
[46] xml2_1.3.3           rmarkdown_2.16       svglite_2.1.0       
[49] httr_1.4.4           rstudioapi_0.14      R6_2.5.1            
[52] nlme_3.1-157         compiler_4.2.1