Causal Impact

5 minute read

Causal Impact Analysis

What is it?

CausalImpact is an R package for causal inference using Bayesian structural time-series models. It implements an approach to estimate the causal effect of a designed intervention on a time series. For instance, in the following example, we calculate the impact that the VolksWagen Emissions Scandal had on their stock price.

The package was developed and open sourced by Google, for more information:

How does it work:

Given a response time series (e.g., clicks) and a set of control time series (e.g., clicks in non-affected markets or clicks on other sites), the package constructs a Bayesian structural time-series model. This model is then used to try and predict the counterfactual, i.e., how the response metric would have evolved after the intervention if the intervention had never occurred.

There are two ways of running an analysis with the CausalImpact R package. You can either let the package construct a suitable model automatically or you can specify a custom model. In the former case, the kind of model constructed by the package depends on your input data:

  • If your data contains no predictor time series (i.e., the data argument is a univariate time series), then the model contains a local level component and, if specified in model.args, a seasonal component. It’s generally not recommended to do this as the counterfactuals predicted by your model will be overly simplistic. They are not using any information from the post-period. Causal inference then becomes as hard as forecasting. Having said that, the model still provides you with prediction intervals, which you can use to assess whether the deviation of the time series in the post-period from its baseline is significant.
  • If your data contains one or more predictor time series (i.e., the data argument has at least two columns), then, on top of the above, the model contains a regression component. In all practical cases I’ve seen it really is the predictor time series that make the model powerful as they allow you to compute much more plausible counterfactuals. I’d generally recommend adding at least a handful of predictor time series.

Underlying assumptions:

The main assumption is that there is a set control time series that were themselves not affected by the intervention. If they were, we might falsely under- or overestimate the true effect. Or we might falsely conclude that there was an effect even though in reality there wasn’t. The model also assumes that the relationship between covariates and treated time series, as established during the pre-period, remains stable throughout the post-period.

Data Collection

We will use the get.hist.quote() function of the tseries package to retrieve all the relevant stock prices, ggplot2 to create some of the charts and of course CausalImpact to perform the analysis. Let’s start by installing and loading all the necessary libraries:

options(warn = -1)
#install.packages("tseries")
library(tseries)
#install.packages("ggplot2")
library(ggplot2)
#devtools::install_github("google/CausalImpact")
library(CausalImpact)

We first extract the Adjusted Close price for all required stocks and I specifically chose the zoo format as it is the recommended object type to be used with CausalImpact. I’m including VolksWagen’s stock as well as BMW and Allianz Insurance; the last two will be used as regressors of the VW series in the second part of the analysis. The Emissions Scandal broke on Friday the 18th of September 2015, so I’m going to collect weekly data from the beginning of 2011 up to current date.

start = '2011-01-03'
  end = '2017-03-20'
quote = 'AdjClose'
VolksWagen <- get.hist.quote(instrument = "VOW.DE", start, end, quote, compression = "w")
BMW <- get.hist.quote(instrument = "BMW.DE", start, end, quote, compression = "w")
Allianz <- get.hist.quote(instrument = "ALV.DE", start, end, quote, compression = "w")
series <- cbind(VolksWagen, BMW, Allianz)

We then plot the three time series.

colnames(series) <- c("VolksWagen", "BMW", "Allianz")
autoplot(series, facet = NULL) + xlab("") + ylab("Adjusted Close Price")

We need to define the pre- and post-intervention periods (the emission scandal started on the 18th of September 2015)

pre.period <- as.Date(c(start, "2015-09-14"))
post.period <- as.Date(c("2015-09-21", end))

A Simple Model

The Causal Impact function needs at least three arguments: data, pre.period and post.period. The easiest way to perform a causal analysis is to provide only the series where the intervention took place as the data input and specify the seasonality frequency in the model.args parameter. This is equivalent as specifying a local level model with a seasonality component:

impact_vw <- CausalImpact(series[, 1], pre.period, post.period, model.args = list(niter = 1000, nseasons = 52))
plot(impact_vw)

summary(impact_vw)

## Posterior inference {CausalImpact}
## 
##                          Average      Cumulative    
## Actual                   130          10252         
## Prediction (s.d.)        168 (24)     13285 (1901)  
## 95% CI                   [123, 217]   [9722, 17124] 
##                                                     
## Absolute effect (s.d.)   -38 (24)     -3033 (1901)  
## 95% CI                   [-87, 6.7]   [-6873, 529.7]
##                                                     
## Relative effect (s.d.)   -23% (14%)   -23% (14%)    
## 95% CI                   [-52%, 4%]   [-52%, 4%]    
## 
## Posterior tail-area probability p:   0.04928
## Posterior prob. of a causal effect:  95.072%
## 
## For more details, type: summary(impact, "report")

A quick look at the output should convince you that this method is probably not the best, at least for this data, as the confidence intervals of the estimates increases drastically with time.

Including Regressors

We can try to improve our model by supplying one or more covariates so that we’re basically performing a regression on our response variable. We will use the BMW and Allianz stock prices to explain our target series (you may argue that those series - especially BMW - may have been influenced by the scandal as well and that may be true, but certainly at a lower magnitude):

impact_vw_reg <- CausalImpact(series, pre.period, post.period, model.args = list(niter = 1000, nseasons = 52))
plot(impact_vw_reg)

summary(impact_vw_reg)

## Posterior inference {CausalImpact}
## 
##                          Average        Cumulative    
## Actual                   130            10252         
## Prediction (s.d.)        176 (5.9)      13874 (463.3) 
## 95% CI                   [163, 187]     [12905, 14765]
##                                                       
## Absolute effect (s.d.)   -46 (5.9)      -3622 (463.3) 
## 95% CI                   [-57, -34]     [-4514, -2653]
##                                                       
## Relative effect (s.d.)   -26% (3.3%)    -26% (3.3%)   
## 95% CI                   [-33%, -19%]   [-33%, -19%]  
## 
## Posterior tail-area probability p:   0.001
## Posterior prob. of a causal effect:  99.8997%
## 
## For more details, type: summary(impact, "report")

The output of this second analysis looks much better: the confidence intervals of the estimate are fairly stable over time. Since we’re looking at stock prices, we shouldn’t look at the cumulative effect, but focus on the Average section.

The console output shows you the actual vs predicted effect (Average) as well as the absolute and relative effect. The output of the second analysis is saying that the Emissions Scandal brought down VolksWagen stocks by 26% - from a predicted $176 to an actual $130.

Another hint in favor of the latter model is given by the Standard Deviation of the estimates, which was 24 in the first model and is now down to 5.9.

Updated: