VARMAX models

This is a brief introduction notebook to VARMAX models in Statsmodels. The VARMAX model is generically specified as: $$ y_t = \nu + A_1 y_{t-1} + \dots + A_p y_{t-p} + B x_t + \epsilon_t + M_1 \epsilon_{t-1} + \dots M_q \epsilon_{t-q} $$

where $y_t$ is a $\text{k_endog} \times 1$ vector.

In [1]:
%matplotlib inline
In [2]:
import numpy as np
import pandas as pd
import statsmodels.api as sm
import matplotlib.pyplot as plt
In [3]:
dta = sm.datasets.webuse('lutkepohl2', 'https://www.stata-press.com/data/r12/')
dta.index = dta.qtr
endog = dta.loc['1960-04-01':'1978-10-01', ['dln_inv', 'dln_inc', 'dln_consump']]

Model specification

The VARMAX class in Statsmodels allows estimation of VAR, VMA, and VARMA models (through the order argument), optionally with a constant term (via the trend argument). Exogenous regressors may also be included (as usual in Statsmodels, by the exog argument), and in this way a time trend may be added. Finally, the class allows measurement error (via the measurement_error argument) and allows specifying either a diagonal or unstructured innovation covariance matrix (via the error_cov_type argument).

Example 1: VAR

Below is a simple VARX(2) model in two endogenous variables and an exogenous series, but no constant term. Notice that we needed to allow for more iterations than the default (which is maxiter=50) in order for the likelihood estimation to converge. This is not unusual in VAR models which have to estimate a large number of parameters, often on a relatively small number of time series: this model, for example, estimates 27 parameters off of 75 observations of 3 variables.

In [4]:
exog = endog['dln_consump']
mod = sm.tsa.VARMAX(endog[['dln_inv', 'dln_inc']], order=(2,0), trend='nc', exog=exog)
res = mod.fit(maxiter=1000, disp=False)
print(res.summary())
/usr/lib/python3/dist-packages/statsmodels/tsa/base/tsa_model.py:171: ValueWarning: No frequency information was provided, so inferred frequency QS-OCT will be used.
  % freq, ValueWarning)
/usr/lib/python3/dist-packages/statsmodels/tsa/statespace/representation.py:381: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  return matrix[[slice(None)]*(matrix.ndim-1) + [0]]
                             Statespace Model Results                             
==================================================================================
Dep. Variable:     ['dln_inv', 'dln_inc']   No. Observations:                   75
Model:                            VARX(2)   Log Likelihood                 348.264
Date:                    Fri, 20 Sep 2019   AIC                           -670.527
Time:                            18:23:49   BIC                           -640.400
Sample:                        04-01-1960   HQIC                          -658.498
                             - 10-01-1978                                         
Covariance Type:                      opg                                         
===================================================================================
Ljung-Box (Q):                58.20, 41.03   Jarque-Bera (JB):         17.43, 10.09
Prob(Q):                        0.03, 0.43   Prob(JB):                   0.00, 0.01
Heteroskedasticity (H):         0.45, 1.14   Skew:                      0.07, -0.55
Prob(H) (two-sided):            0.05, 0.75   Kurtosis:                   5.36, 4.42
                            Results for equation dln_inv                            
====================================================================================
                       coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------------
L1.dln_inv          -0.2758      0.088     -3.130      0.002      -0.448      -0.103
L1.dln_inc           0.3199      0.622      0.515      0.607      -0.898       1.538
L2.dln_inv          -0.1157      0.157     -0.736      0.462      -0.424       0.192
L2.dln_inc           0.3986      0.385      1.035      0.301      -0.356       1.154
beta.dln_consump     0.5625      0.750      0.750      0.453      -0.908       2.033
                            Results for equation dln_inc                            
====================================================================================
                       coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------------
L1.dln_inv           0.0331      0.042      0.788      0.431      -0.049       0.116
L1.dln_inc           0.0813      0.132      0.614      0.539      -0.178       0.341
L2.dln_inv           0.0522      0.051      1.024      0.306      -0.048       0.152
L2.dln_inc           0.2661      0.168      1.583      0.113      -0.063       0.596
beta.dln_consump     0.5012      0.201      2.495      0.013       0.107       0.895
                                  Error covariance matrix                                   
============================================================================================
                               coef    std err          z      P>|z|      [0.025      0.975]
--------------------------------------------------------------------------------------------
sqrt.var.dln_inv             0.0441      0.003     14.180      0.000       0.038       0.050
sqrt.cov.dln_inv.dln_inc     0.0013      0.002      0.555      0.579      -0.003       0.006
sqrt.var.dln_inc            -0.0127      0.001    -12.393      0.000      -0.015      -0.011
============================================================================================

Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).

From the estimated VAR model, we can plot the impulse response functions of the endogenous variables.

In [5]:
ax = res.impulse_responses(10, orthogonalized=True).plot(figsize=(13,3))
ax.set(xlabel='t', title='Responses to a shock to `dln_inv`');
/usr/lib/python3/dist-packages/statsmodels/tsa/statespace/representation.py:381: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  return matrix[[slice(None)]*(matrix.ndim-1) + [0]]

Example 2: VMA

A vector moving average model can also be formulated. Below we show a VMA(2) on the same data, but where the innovations to the process are uncorrelated. In this example we leave out the exogenous regressor but now include the constant term.

In [6]:
mod = sm.tsa.VARMAX(endog[['dln_inv', 'dln_inc']], order=(0,2), error_cov_type='diagonal')
res = mod.fit(maxiter=1000, disp=False)
print(res.summary())
/usr/lib/python3/dist-packages/statsmodels/tsa/base/tsa_model.py:171: ValueWarning: No frequency information was provided, so inferred frequency QS-OCT will be used.
  % freq, ValueWarning)
                             Statespace Model Results                             
==================================================================================
Dep. Variable:     ['dln_inv', 'dln_inc']   No. Observations:                   75
Model:                             VMA(2)   Log Likelihood                 353.888
                              + intercept   AIC                           -683.776
Date:                    Fri, 20 Sep 2019   BIC                           -655.966
Time:                            18:23:49   HQIC                          -672.671
Sample:                        04-01-1960                                         
                             - 10-01-1978                                         
Covariance Type:                      opg                                         
===================================================================================
Ljung-Box (Q):                68.54, 39.32   Jarque-Bera (JB):         12.77, 13.53
Prob(Q):                        0.00, 0.50   Prob(JB):                   0.00, 0.00
Heteroskedasticity (H):         0.44, 0.81   Skew:                      0.06, -0.48
Prob(H) (two-sided):            0.04, 0.60   Kurtosis:                   5.02, 4.84
                           Results for equation dln_inv                          
=================================================================================
                    coef    std err          z      P>|z|      [0.025      0.975]
---------------------------------------------------------------------------------
const             0.0182      0.005      3.789      0.000       0.009       0.028
L1.e(dln_inv)    -0.2571      0.106     -2.430      0.015      -0.465      -0.050
L1.e(dln_inc)     0.5093      0.630      0.808      0.419      -0.726       1.745
L2.e(dln_inv)     0.0309      0.150      0.206      0.836      -0.262       0.324
L2.e(dln_inc)     0.1863      0.476      0.392      0.695      -0.746       1.119
                           Results for equation dln_inc                          
=================================================================================
                    coef    std err          z      P>|z|      [0.025      0.975]
---------------------------------------------------------------------------------
const             0.0207      0.002     13.118      0.000       0.018       0.024
L1.e(dln_inv)     0.0482      0.042      1.159      0.247      -0.033       0.130
L1.e(dln_inc)    -0.0751      0.140     -0.536      0.592      -0.350       0.199
L2.e(dln_inv)     0.0180      0.043      0.423      0.672      -0.065       0.101
L2.e(dln_inc)     0.1213      0.153      0.793      0.428      -0.179       0.421
                             Error covariance matrix                              
==================================================================================
                     coef    std err          z      P>|z|      [0.025      0.975]
----------------------------------------------------------------------------------
sigma2.dln_inv     0.0020      0.000      7.345      0.000       0.001       0.003
sigma2.dln_inc     0.0001   2.33e-05      5.833      0.000    9.01e-05       0.000
==================================================================================

Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).

Caution: VARMA(p,q) specifications

Although the model allows estimating VARMA(p,q) specifications, these models are not identified without additional restrictions on the representation matrices, which are not built-in. For this reason, it is recommended that the user proceed with error (and indeed a warning is issued when these models are specified). Nonetheless, they may in some circumstances provide useful information.

In [7]:
mod = sm.tsa.VARMAX(endog[['dln_inv', 'dln_inc']], order=(1,1))
res = mod.fit(maxiter=1000, disp=False)
print(res.summary())
/usr/lib/python3/dist-packages/statsmodels/tsa/statespace/varmax.py:152: EstimationWarning: Estimation of VARMA(p,q) models is not generically robust, due especially to identification issues.
  EstimationWarning)
/usr/lib/python3/dist-packages/statsmodels/tsa/base/tsa_model.py:171: ValueWarning: No frequency information was provided, so inferred frequency QS-OCT will be used.
  % freq, ValueWarning)
/usr/lib/python3/dist-packages/statsmodels/tsa/statespace/representation.py:381: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  return matrix[[slice(None)]*(matrix.ndim-1) + [0]]
                             Statespace Model Results                             
==================================================================================
Dep. Variable:     ['dln_inv', 'dln_inc']   No. Observations:                   75
Model:                         VARMA(1,1)   Log Likelihood                 354.288
                              + intercept   AIC                           -682.576
Date:                    Fri, 20 Sep 2019   BIC                           -652.449
Time:                            18:23:49   HQIC                          -670.547
Sample:                        04-01-1960                                         
                             - 10-01-1978                                         
Covariance Type:                      opg                                         
===================================================================================
Ljung-Box (Q):                68.29, 39.20   Jarque-Bera (JB):         11.18, 14.56
Prob(Q):                        0.00, 0.51   Prob(JB):                   0.00, 0.00
Heteroskedasticity (H):         0.43, 0.91   Skew:                      0.01, -0.46
Prob(H) (two-sided):            0.04, 0.81   Kurtosis:                   4.89, 4.95
                           Results for equation dln_inv                          
=================================================================================
                    coef    std err          z      P>|z|      [0.025      0.975]
---------------------------------------------------------------------------------
const             0.0104      0.064      0.163      0.871      -0.114       0.135
L1.dln_inv        0.0031      0.691      0.004      0.996      -1.351       1.357
L1.dln_inc        0.3789      2.681      0.141      0.888      -4.876       5.634
L1.e(dln_inv)    -0.2559      0.701     -0.365      0.715      -1.630       1.118
L1.e(dln_inc)     0.1248      2.931      0.043      0.966      -5.621       5.870
                           Results for equation dln_inc                          
=================================================================================
                    coef    std err          z      P>|z|      [0.025      0.975]
---------------------------------------------------------------------------------
const             0.0163      0.027      0.609      0.542      -0.036       0.069
L1.dln_inv       -0.0360      0.276     -0.131      0.896      -0.577       0.505
L1.dln_inc        0.2451      1.090      0.225      0.822      -1.891       2.381
L1.e(dln_inv)     0.0908      0.283      0.321      0.748      -0.463       0.645
L1.e(dln_inc)    -0.2446      1.125     -0.217      0.828      -2.449       1.960
                                  Error covariance matrix                                   
============================================================================================
                               coef    std err          z      P>|z|      [0.025      0.975]
--------------------------------------------------------------------------------------------
sqrt.var.dln_inv             0.0449      0.003     14.511      0.000       0.039       0.051
sqrt.cov.dln_inv.dln_inc     0.0016      0.003      0.641      0.521      -0.003       0.007
sqrt.var.dln_inc             0.0116      0.001     11.724      0.000       0.010       0.013
============================================================================================

Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).