Kalman Filter and Pairs Trading

Introduction

In previous post we have seen Kalman Filter and its ability to online train a linear regression model. In last post we have also seen the idea of cointegration and pairs trading. As pointed out at the end of last post, one way to avoid look-ahead bias and gain walk forward analysis is through Bayesian online training mechanism such as Kalman Filter. Today we’ll apply this idea to pairs trading.

As usual, the backtest codes in this post is located in the strategy folder. All other codes can be found in the research folder.

Pairs Trading via Kalman Filter

The idea is simple. Because we can obtain pairs trading hedge coefifcient through linear regression, and linear regression can be solved by Kalman Filter as in this post, therefore we can link the pairs through Kalman Filter. In this post we are going to use PyKalman package, so the only thing you need to do is to understand the concept and then express the problem in Bayesian format. Let’s inherit the notations from previous post (refer to as Prev).

The state variables are still the intercept $a_k$ and the slope $b_k$ as in Prev$(2.1)$. But this time let’s observe one $y_t$ a time. So Prev$(2.2)$ is re-written as

$$
y_t=[1,x_t]\begin{bmatrix} a_t \\ b_t \end{bmatrix}+v_t
\tag{2.1}
$$

If we consider the case EWA as indepedent variable $x$ and EWC as depedent variable $y$, then the setting corresponds to

$$
\begin{array}{lcl}
G_t &=&I \\
\theta_t&=&\begin{bmatrix} a_t \\ b_t \end{bmatrix} \\
F_t&=&[1,x_t]
\end{array}
\tag{2.2}
$$

in Prev$(A.1)$ and Prev$(A.2)$. Notice that the observation matrix $F_t$ changes everytime with new EWA price $x_t$.

The following code explorer the relationship betwen EWA and EWC as a scatterplot colored by time.

1
2
3
4
5
6
7
8
cm = plt.get_cmap('jet')
colors = np.linspace(0.1, 1, data.shape[0])
sc = plt.scatter(data[sym_a], data[sym_b], s=10, c=colors, cmap=cm, edgecolors='k', alpha=0.7)
cb = plt.colorbar(sc)
cb.ax.set_yticklabels(str(p.date()) for p in data[::data.shape[0]//9].index)
plt.xlabel('EWA')
plt.ylabel('EWC')
plt.show()

From the scatterplot we can tell that their relationship changes between year 2010 and 2018. Therefore, the hedge ratio changes over time and our strategy needs to adapt to it. Otherwise using a static hedge ratio from linear regression would result in over- or under- hedge.

That is when Kalman Filter comes in to help. This time instead of do it manually, let’s model Kalman filter with the help of pykalman package.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# observation matrix F is 2-dimensional, containing sym_a price and 1
# there are data.shape[0] observations
obs_mat_F = np.transpose(np.vstack([data[sym_a].values, np.ones(data.shape[0])])).reshape(-1, 1, 2)

kf = KalmanFilter(n_dim_obs=1, # y is 1-dimensional
n_dim_state=2, # states (alpha, beta) is 2-dimensinal
initial_state_mean=np.ones(2), # initial value of intercept and slope theta0|0
initial_state_covariance=np.ones((2, 2)), # initial cov matrix between intercept and slope P0|0
transition_matrices=np.eye(2), # G, constant
observation_matrices=obs_mat_F, # F, depends on x
observation_covariance=1, # v_t, constant
transition_covariance= np.eye(2)) # w_t, constant

state_means, state_covs = kf.filter(data[sym_b]) # observes sym_b price
beta_kf = pd.DataFrame({'Slope': state_means[:, 0], 'Intercept': state_means[:, 1]}, index=data.index)
beta_kf.plot(subplots=True)
plt.show()

The code above is well commented. First fill in every input parameters based on your model design, and then just call kf.filter function to perform Kalman Filter. Note that the results are sensitive to your inputs such as your belief on observation_covariance and transition_covariance because they don’t get adjusted over time.

From the figure above it seems slope between EWA and EWC is pretty stable around $0.85$.

We can see how the regression line evolves over time, and relative to the line in black, which is the OLS line fitted to the whole dataset.

Last but not least, above codes work correctly but if we want to use Kalman filter in practce we have to take a different approach. That is, we want it to be updated step by step. Fortunately PyKalman also prvoides a function called kf.filter_update that serves this purpose. The code is a bit too long to fit here so I relegate it to github. You are encouraged to run it to see the equivalence but the latter allows for model updates every day upon new EWA and EWC prices, which will in turn be used in the next section.

Pairs Trading Backtest

It is time to backtest the EWA-EWC pairs trading on the Bollinger-bands strategy via Kalman Filter updates. The strategy is the same as last post. This allows us to compare the results with simple linear regression.

There is one thing needs to be pointed out in order to better understand the code, that is, the measurement error given in Pre$(A.5)$ is actually the spread at time k. To see this, notice that the first item on the right-hand side, $y_k$, is the EWC price, and the second item,

$$
F_k\hat{\theta}_{k|k-1}=a_k+b_k*x_k
\tag{3.1}
$$

is the hedge side of EWA.

By the same logic, Pre$(A.6)$ is the variance of spread. In addition, Pre$(A.2)$ assumes the spread/eror term is normally distributed with zero mean and variance of Pre$(A.6)$. Therefore Pre$(A.5)$ and Pre$(A.6)$ provide the moving average and moving standard deviation (as sqaure root of variance) for Bollinger bands.

Unfortunately PyKalman package doesn’t return them directly so we have to calculate them manually in the strategy. Here is the trading logic (also see [1]). Full code can be found here on github.

  1. On each day, observe EWA price $x_t$ and EWC price $y_t$
  2. Calculate Pre$A.5$ and Pre$A.6$ as current spread and its variance
  3. Let Bollinger band be $\hat{y}_{k}\pm\delta\sqrt{S_k}$

The backtest result is shown as follows. It seems the strategy was good until year 2017.

Reference