![]() |
Jupyter at Bryn Mawr College |
|
|
Public notebooks: /services/public/dblank / Experiments |
What is Theano? Theano is a method of describing mathematical operations that can be carried out either in Python or by low-level C code on a Graphics Processing Unit (GPU).
To get started, let's explore a very simple example. Consider the function:
$ f(x, y) = x + y $
In mathematics, we write the function $f$ that takes two parameters, $x$ and $y$. The body of the function is simply $x + y$.
In Python, we would write that as:
def f(x, y):
return x + y
f(5, 6)
In Theano, we use the Python language as a method of writing symbolic expressions.
First, we import the needed components T
and function
:
import theano.tensor as T
from theano import function
Next, we define two symbols, x and y. These are both Python variables and Theano symbols. The quoted 'x' is the Theano symbol, which is assigned to a Python variable of the same name (to make it easier to understand and help during debugging).
x = T.scalar('x')
The term scalar simply means that it has a single value. In Theano, this is sometimes called a zero-dimensional array.
We do the same with y:
y = T.scalar('y')
# or both together:
x, y = T.scalars('x', 'y')
As you can see, the Python variable x is of type TensorVariable.
type(x)
Next, we define a Theano function. We do this in two steps. First, we describe the body of the function:
func = x + y
This looks like we are adding x and y together. However, we note that x and y are symbols... they do not yet have values.
We can see that func is something very strange:
func
The second step in creating a function is to make a Python function. To do this, we use the Theano function
function. It takes a list of parameters as symbols, followed by the Theano function body:
pyfunc = function(inputs=[x, y], outputs=func)
# or:
# pyfunc = function([x, y], func)
pyfunc
pyfunc(5, 6)
You can take a Theano function body and show it in symbolic form, perhaps close to the original Python source:
from theano import pp
pp(func)
This example will use matrices, one of the main uses of Theano.
In this example, we will use T.dmatrix to define a matrix of doubles (floating point values).
a = T.dmatrix('a')
b = T.dmatrix('b')
Again, we define the Theano function, and the Python function:
func = a + b
pyfunc = function([a, b], func)
pyfunc([[1, 2],
[3, 4]],
[[10, 20],
[100, 200]])
The result is given as a numpy ndarray:
result = pyfunc([[1, 2],
[3, 4]],
[[10, 20],
[100, 200]])
type(result)
You can create matrices composed of:
x = T.dscalar('x')
fx = T.exp(T.sin(x**2))
pyfunc = function([x], fx)
pyfunc(10)
Does that agree with what we would compute directly in Python?
import math
def eq(x):
return math.exp(math.sin(x ** 2))
eq(10)
Yes, they do agree.
Let's plot the function using matplotlib.
%matplotlib inline
from matplotlib.pyplot import plot
import numpy
plot(numpy.arange(0, 15, .01),
[pyfunc(x) for x in numpy.arange(0, 15, .01)])
Now we will compute the derivative of our function. Normally, in regular Python, we would have to solve for the derivative. However, because we have described the function symbolically, Theano can solve for the derivative via the symbols.
To do this, we will use T.grad() with respect to (wrt) x:
fp = T.grad(fx, wrt=x)
fprime = function([x], fp)
fprime(10)
Let's plot the derivative of the function:
import numpy
plot(numpy.arange(0, 15, .01),
[fprime(x) for x in numpy.arange(0, 15, .01)])
For this experiment, we will learn the function XOR using back-propagation of error.
First, we import the elements from theano and numpy that we will need:
import theano
import theano.tensor as T
from theano import function, pp
import theano.tensor.nnet as nnet
import numpy as np
import random
%matplotlib inline
from matplotlib.pyplot import plot
Recall XOR:
Input 1 | Input 2 | Target |
---|---|---|
0 | 0 | 0 |
0 | 1 | 1 |
1 | 0 | 1 |
1 | 1 | 0 |
That is, given 2 inputs the output is True if either of the inputs is True, but not if both are.
Three layers:
First we define two Theano symbols for representing the inputs and desired output (called the target).
th_inputs = T.dvector('inputs') # two inputs
th_target = T.dscalar('target') # one target/output
To compute the activation at a layer, we take the inputs * weights to get the net activation, and then apply the sigmoid:
def compute_activation(inputs, weights):
bias = np.array([1], dtype='float64')
all_inputs = T.concatenate([inputs, bias])
net_input = T.dot(weights.T, all_inputs)
activation = nnet.sigmoid(net_input)
return activation
To give you a sense of what compute_activation is, you can see a representation of the computation:
pp(compute_activation(th_inputs, th_target))
What does the sigmoid function return? Let's turn that into a Python function and plot it:
x = T.dscalar('x') # 64-bit float
fx = nnet.sigmoid(x) # Theano function
fx
pp(fx)
sigmoid = function([x], fx) # Python function
sigmoid(.5)
xs = range(-10, 10, 1)
ys = [sigmoid(x) for x in xs]
plot(xs, ys, "o-")
We create a shared variable named epsilon to control the learning rate. The learning rate will typically be in the range 0.9 to 0.01. This value depends on the function being learned.
epsilon = theano.shared(0.1, name='epsilon') # learning rate
Now we define the method to update the weights. We find the derivative with respect to the weights, and subtract that value from the weights.
def compute_delta_weights(compute_error, weights):
return weights - (epsilon * T.grad(compute_error, wrt=weights))
We define the first set of weights to go between the inputs and the hidden layer.
NUM_INPUTS = 2
NUM_HIDDENS = 2
NUM_OUTPUTS = 1
The initial random weights are created to span from -1 to 1.
def make_weights(ins, outs):
return np.array(2 * np.random.rand(ins + 1, outs) - 1,
dtype='float64')
weights1 = theano.shared(make_weights(NUM_INPUTS, NUM_HIDDENS),
name='weights1')
hidden_layer = compute_activation(th_inputs, weights1)
We can test the hidden_layer Theano function by turning it into a Python function, and calling it with [0, 0] bound to th_inputs:
function([th_inputs], hidden_layer)([0, 0])
weights2 = theano.shared(make_weights(NUM_HIDDENS, NUM_OUTPUTS),
name='weights2') # 4 x 1
output_layer = T.sum(compute_activation(hidden_layer, weights2)) # Theano function
compute_error = (output_layer - th_target) ** 2 # Theano function
We can test the entire Theano equation now by calling compute_error with values for th_inputs and th_target:
function([th_inputs, th_target], compute_error)([0, 0], 0)
Finally, we can create a Python function (called train) that will call compute_error, and update the weights:
train = function(
inputs=[th_inputs, th_target],
outputs=compute_error,
updates=[(weights1, compute_delta_weights(compute_error, weights1)),
(weights2, compute_delta_weights(compute_error, weights2))])
Ok, now we are ready to train a neural network to perform the XOR function.
inputs = [[0, 1],
[1, 0],
[1, 1],
[0, 0]]
def xor(a, b):
return int((a or b) and not(a and b))
for pattern in inputs:
print(pattern, xor(*pattern))
We simply call train()
on each of the input/target pairs:
def train_all(epochs=5000):
for e in range(epochs):
random.shuffle(inputs)
for i in range(len(inputs)):
target = xor(*inputs[i])
error = train(inputs[i], target)
if (e + 1) % 500 == 0 or e == 0:
print('Epoch:', e + 1, 'error:', error)
%%time
train_all()
test = function([th_inputs], output_layer)
test([0, 0])
test([0, 1])
test([1, 0])
test([1, 1])
To train another network, we need to reinitialize the weights:
NUM_HIDDENS = 5
weights1.set_value(make_weights(NUM_INPUTS, NUM_HIDDENS))
weights2.set_value(make_weights(NUM_HIDDENS, NUM_OUTPUTS))
We could also try a different learning rate:
epsilon.set_value(0.05)
%%time
train_all()
for pattern in inputs:
print(pattern, test(pattern))
Although we only trained on the corners, we can see what
import matplotlib.pyplot as plt
res = 50 # resolution
z = np.zeros((res, res))
for x in range(res):
for y in range(res):
z[x][y] = test([x/res, y/res])
plt.imshow(z, cmap=plt.cm.gray, interpolation='nearest')
plt.xlabel("input 1")
plt.ylabel("input 2")
plt.title("Output Activation")
plt.show()
First, I'll use the metakernel's %download magic to get the MNIST hand written data:
! pip install metakernel --user -U
import metakernel
metakernel.register_ipython_magics()
%download http://deeplearning.net/data/mnist/mnist.pkl.gz
Unzip the file:
!gunzip mnist.pkl.gz
From http://deeplearning.net/tutorial/gettingstarted.html we see that:
The pickled file represents a tuple of 3 lists: the training set, the validation set and the testing set. Each of the three lists is a pair formed from a list of images and a list of class labels for each of the images. An image is represented as numpy 1-dimensional array of 784 (28 x 28) float values between 0 and 1 (0 stands for black, 1 for white). The labels are numbers between 0 and 9 indicating which digit the image represents.
We read the Python2 pickled data:
import pickle
import gzip
import numpy
with open('mnist.pkl', 'rb') as f:
u = pickle._Unpickler(f)
u.encoding = 'latin1'
data = u.load()
train_set, validation_set, test_set = data
len(train_set)
len(train_set[0])
len(train_set[0][0])
epsilon.set_value(0.1)
weights1.set_value(make_weights(784, 100))
weights2.set_value(make_weights(100, 1))
inputs = [train_set[0][i] for i in range(len(train_set[0]))]
targets = [train_set[1][i]/9.0 for i in range(len(train_set[0]))]
def display_digit(vector):
for r in range(28):
for c in range(28):
v = int(vector[r * 28 + c] * 10)
ch = " .23456789"[v]
print(ch, end="")
print()
for i in range(10):
display_digit(inputs[i])
print(int(targets[i] * 9))
def train_digits(max_epochs=1000):
epoch = 0
correct = 0
total = 1
while correct/total < 1.0 and epoch < max_epochs:
total = 0
correct = 0
for i in range(100):
target = targets[i]
error = train(inputs[i], target)
output = test(inputs[i])
target = int(targets[i] * 9)
total += 1
if int(output * 10) == target:
correct += 1
if (epoch + 1) % 10 == 0 or epoch == 0:
print('Epoch:', epoch + 1, 'error:', error, "% correct:", correct/total)
epoch += 1
train_digits()
for i in range(100):
output = test(inputs[i])
target = int(targets[i] * 9)
print("target:", target, "output:", output, "correct?", int(output * 10) == target)