Tensorflow Linear Regression

Introduction

After four posts on linear regression, we are finally at the door of deep learning. Today we will build a simple feed-forward neural network (but not deep) with the help of Tensorflow to solve the linear regression problem. Tensorflow is a popular open-source deep learning library, especially after the retirement of Theano. To learn more about installing and using Tensorflow, their official website offers a lot of interesting materials.

Tensorflow structure consists of two phases – Graphs and Sessions. Usually you start with building a graph and then let the data (in the format of multidimensional array or tensor) flow through the graph, hence the name Tensorflow.

The building blocks of a graph are tf.Operation (node) and tf.Tensor (edge). There are three main data types of tensors namely tf.Constant, tf.Variable, and tf.placeholder. In our linear regression example, the slope and intercept are referred to as weights and biases in the conext of Tensorflow. They are examples of tf.Variable. The dataset x and y are examples of tf.placeholder or more exactly, things fed into tf.placeholder during Session.run.

Tensorflow operations take any number of inputs and produce any number of outputs. The gradient descent minimizer we will use shortly is an example of tf.Operation. Finally, after the graph is constructed, a session fires up the program by Session.run() to perform the desired operations on input data.

The rest of the post will solve the linear regression problem used in the last post. it follows closly the steps taken in this book.

Linear Regression via Normal Equation

As in the classical linear regresion post, the easiest way to solve linear equation is to use the closed-form Normal equation. It is also true in Tensorflow, where we only use tf.Constant and tf.Operations.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
import tensorflow as tf

tf.reset_default_graph() # reset graph
x_plus_i = np.c_[np.ones((500,1)), x]
X = tf.constant(x_plus_i, dtype=tf.float64, name='X')
# keep y as our original dataset; make a copy instead
y_copy = tf.constant(y.reshape(-1,1), dtype=tf.float64, name='y')
X_T = tf.transpose(X)
# Normal equation
beta = tf.matmul(tf.matmul(tf.matrix_inverse(tf.matmul(X_T, X)), X_T), y_copy)

with tf.Session() as sess:
beta_value = sess.run(beta)
print('pamameters: %.7f, %.7f' % (beta_value[0], beta_value[1]))

The results are, as expected, 0.6565181 and 2.0086851.

Linear Regression via OLS

It is time to construct a full example by using tf.Variable and tf.placeholder, define batch and feed them foward into feed dictionary, and propagate back the errors into a Gradient Descent optimizer.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
import tensorflow as tf

tf.reset_default_graph()
batch_size = 50
n_batches = int(500/batch_size)
n_epochs = 1000
learing_rate = 0.01

# define the graph
w = tf.Variable(tf.truncated_normal([1], mean=0.0, stddev=1.0, dtype=tf.float64, name='slope'))
b = tf.Variable(tf.zeros(1, dtype=tf.float64), name='intercept')
x_ph = tf.placeholder(tf.float64, shape=(None, 1), name='x')
y_ph = tf.placeholder(tf.float64, shape=(None, 1), name='y')
y_pred = tf.add(b, tf.multiply(w, x_ph), name='prediction')
error = y_pred - y_ph
mse = tf.reduce_mean(tf.square(error), name='mse')
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learing_rate)
training_op = optimizer.minimize(mse)

# record to TensorBoard
from datetime import datetime
now = datetime.now().strftime('%Y%m%d%H%M%S')
logdir = 'tf_logs/run-{}'.format(now)
mse_summary = tf.summary.scalar('MSE', mse)
file_writer = tf.summary.FileWriter(logdir, tf.get_default_graph())

init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)

for epoch in range(n_epochs):
for batch_index in range(n_batches):
# Mini-batch gradient descent
x_batch = x[batch_index*batch_size:(batch_index+1)*batch_size].reshape(-1,1)
y_batch = y[batch_index * batch_size:(batch_index + 1) * batch_size].reshape(-1, 1)
sess.run(training_op, feed_dict={x_ph: x_batch, y_ph: y_batch})

summary_str = mse_summary.eval(feed_dict={x_ph: x_batch, y_ph: y_batch})
step = epoch * n_batches + batch_index
file_writer.add_summary(summary_str, step)

w_val, b_val = sess.run([w, b])
print('epoch {}: slope {}, intercept {}'.format(epoch, w_val[0], b_val[0]))

file_writer.close()

Below graph shows how x and y combine with intercept and slope to form prediction; then pass through two operations of sub and square to reach mse; and eventually how they contribute to the gradient descent optimizer. This code snippet scratched the surface of Tensorflow; deep down, it is still a traditional numerical minimization. Next step we are going to investigate a real deep neutral network called Long Short Term Memory (LSTM), which suits very well in financial time series analysis such as stock market forecast.

Reference

  • Géron, Aurélien. Hands-on machine learning with Scikit-Learn and TensorFlow: concepts, tools, and techniques to build intelligent systems. “ O’Reilly Media, Inc.”, 2017.