## Introduction

In last post we examined the Bayesian approach for linear regression. It relies on the conjugate prior assumption, which nicely sets posterior to Gaussian distribution. In reality, most times we don’t have this luxury, so we rely instead on a technique called Markov Chain Monte Carlo (MCMC). One popular algorithm in this family is Metropolis–Hastings and this is what we are looking at today. Before we proceed, I want to point out that this post is inpsired by this article in R.

MCMC answers to this question: if we don’t know what the posterior distribution looks like, and we don’t have the closed form solution as given in equation (2.5) of last post for $\beta_1$ and $\Sigma_{\beta,1}$, how do we obtain the posterior distrubtion of $\beta$? Can we at least approximate it? Metropolis–Hastings provides a numerical Monte Carlo simulation method to magically draw a sample out of the posterior distribution. The magic is to construct a Markov Chain that converges to the given distribution as its stationary equilibrium distribution. Hence the name Markov Chain Monte Carlo (MCMC).

## MCMC Simple Linear Regression

Back to our simple linear regression example, firstly let’s repeat the Bayesian Theorem from last post.

$$f(\beta|y,X)=\frac{f(y|\beta,X)f(\beta|X)}{f(y|X)} \tag{2.1}$$

where $\beta$ is regression parameter; y and X each have 500 observations.

The distribution of interest is our posterior distribution of $\beta$ denoted as $f(\beta|y,X)$. This time we technically only do one Bayesian iteration (remember we updated our belief 250 times in the last post?) because the purpose is to show how to draw a MCMC sample distribution from the first posterior iteration. We include all the 500 points in the Bayesian so the result is expected to reflect full information dataset and converge to the true value.

The Metropolis-Hastings algorithm works as follows.

1. Start with a random parameter value $\beta_0$
2. Choose a new $\beta’$ based on some proposal function $g(\beta’|\beta_0)$. As pointed out here, $g$ must be symmetric, usually just Gaussian.
3. Calculate the acceptance ratio

$$\alpha=\frac{f(y|\beta’)f(\beta’)}{f(y|\beta_0)f(\beta_0)} \tag{2.2}$$

1. Draw a random number $u \in [0,1]$. Jump from $\beta_0$ to $\beta’$ if $u <= \alpha$, and denote it as $\beta_1$; otherwise stay with the old point.
2. Repeat step 2, 3, 4, and collect $\beta_0$, $\beta_1$,…,etc.

In the end, throw away some initial $\beta$ values in the beginning based on burn-in hyper-parameter, and Metropolis-Hastings claims that the remaining $\beta$ series comes from the distribution of posterior $f(\beta|y,X)$.

Now let’s interpret this process by code. There are Python packages such as PyMC3 to use out-of-box but sometimes it helps to look into the black box.

Here is the posterior distribution of $\beta$. The intercept converges to 0.75 (linear regress gives 0.6565181) and the slope converges to 2 (linear regression gives 2.0086851). MCMC does the job.

Reference