Lab 03: Building Artificial Neural Networks

CS371: Cognitive Science
Bryn Mawr College, Fall 2016
Prof. Blank

In the cell below, add your by-line, and delete this cell:


1. Building Artificial Neural Networks

For background information on building and understanding artificial neural networks (sometimes called ANNs or connectionist networks) please read:

1.1 XOR: The Problem that Killed Connectionism for 15 Years

The first network we will explore in this lab is a network to solve the so-called XOR problem:

In a widely influential book called Perceptrons, neural networks were shown to not be able to handle this simple problem. You can read about this story in a few places, including:

Well, let's explore XOR ourselves using the "Backpropagation of Error" algorithm (or just "backprop" for short).

First, we need to import the Python conx module:

In [ ]:
from conx import Network

And define a network to be able to handle the XOR problem. We create "layers" in the network by passing a list of numbers to the Network constructor. If we want a 2-input, 2-hidden, 1-output network, we would use:

In [ ]:
net = Network(2, 2, 1)

To see a network's layers, we can see the representation of the network:

In [ ]:

Layer 0 is the hidden layer, and Layer 1 is the output layer. The input is really a "layer" but just a pattern that we provide the network.

By the way, conx is written in Theano, which can take advantage of a computer's Graphics Processing Unit, or GPU. To see if the conx code is running on the "cpu" (slower) or the "gpu" you can do this:

In [ ]:

conx will still be fast on the "cpu", but it can be even faster on a "gpu".

Ok, now we are ready to train a neural network to perform the XOR function.

First, we set up the inputs. Inputs are a list of list of inputs:

In [ ]:
inputs = [[0, 0],
          [0, 1],
          [1, 0],
          [1, 1]]

These are the four possible inputs for the XOR problem. Next, we need to identify what the "desired output" or "target" is for each of these. We could list those targets in the input patterns themselves, like:

inputs = [[[0, 0], [0]],
          [[0, 1], [1]],
          [[1, 0], [1]],
          [[1, 1], [0]]]

But we can also define a target function to compute the target on the fly. Let's try that:

In [ ]:
def xor(inputs):
    a = inputs[0]
    b = inputs[1]
    return [int((a or b) and not(a and b))]


Then we can set the inputs, and we are ready to go:

In [ ]:

First, let's see what an untrained network will do with the XOR patterns:

In [ ]:

This shows the input pattern and actual output of the network. The correct percentage is for us humans, as is computed by comparing the target (desired output) with the actual output. If they are close enough, we call it correct. Close enough can be set by changing the "tolerance" as we will see.

Ok, now, let's change the weights of the network so that we are more likely to get what we desire:

In [ ]:

By default, the network will train for 5,000 "epochs". An epoch is a sweep through the input patterns. There are 4 input patterns in this case.

The network may, or may not, learn the pattern in 5,000 epochs.

To continue train, you just need to force it to train some more my issuing another net.train().

Once you are done training, you can check what the network will do on a particular pattern by useing the propagate method:

In [ ]:
net.propagate([1, 1])

This asks the network to put [1, 1] on the input layer and propagate it through to the output. The output should be close to zero, if the network is trained appropriately.

You can try all of the patterns manually:

In [ ]:
net.propagate([0, 1])
In [ ]:
net.propagate([1, 0])
In [ ]:
net.propagate([1, 1])

Or you just call the test method:

In [ ]:

To reset the network, use the reset method:

In [ ]:
In [ ]:
In [ ]:

1.2 Generalization

Although we only trained on the "corners" ((0,0), (1, 0), (0, 1), and (1, 1)), we can see what the network does for points "in the middle":

In [ ]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np

res = 50 # resolution
z = np.zeros((res, res))

for x in range(res):
    for y in range(res):
        z[x][y] = net.propagate([x/res, y/res])

plt.imshow(z,, interpolation='nearest')
plt.xlabel("input 1")
plt.ylabel("input 2")
plt.title("Output Activation")
In [ ]:

The net.train() can also take a set of keywords for adjusting learning, including:

  • max_training_epochs: default 5000, # used in train
  • stop_percentage: default 1.0, # used in train
  • tolerance: default 0.1, # used in train
  • report_rate: default 500, # used in train
  • epsilon: default 0.1, # learning rate, used in train
  • momentum: default 0.9, # used in train

Use like:

net.train(momentum=0.6, epsilon=.3)

Exercise 1: Explore training the XOR network. Things to try:

  1. try different hidden-layer sizes
  2. try different learning rates
  3. try different inputs


  1. Does the generalization always look the same?
  2. Does the network always learn the same?

Add as many cells below as you would like. Put your text answers in the following cell.


2. Handwriting Categorization

For our next network, we will explore hand written character recognition.

First, we'll use the %download magic to get the MNIST hand written data:

In [ ]:

Let's unzip the file using the shell command unzip:

In [ ]:
!gunzip mnist.pkl.gz

From we see that:

The pickled file represents a tuple of 3 lists: the training set, the validation set and the testing set. Each of the three lists is a pair formed from a list of images and a list of class labels for each of the images. An image is represented as numpy 1-dimensional array of 784 (28 x 28) float values between 0 and 1 (0 stands for black, 1 for white). The labels are numbers between 0 and 9 indicating which digit the image represents.

You don't really need to understand the next cell. It is a complication because the MNIST data is stored in a serialized ("pickled") format from Python2. We are using Python3.

So, this will allow us to read the Python2 pickled data:

In [ ]:
import pickle
import gzip
import numpy

with open('mnist.pkl', 'rb') as f:
    u = pickle._Unpickler(f)
    u.encoding = 'latin1'
    data = u.load()
    train_set, validation_set, test_set = data
In [ ]:

There are 50000 examples:

In [ ]:

Each picture of a handwritten digit is this big:

In [ ]:

That will be our input. The output can be a single floating-point number representing the digit:

  • .1 for 1
  • .2 for 2

and so on.

In [ ]:
net = Network(784, 100, 1)

We build up the inputs and targets:

In [ ]:
inputs = [train_set[0][i] for i in range(len(train_set[0]))]
targets = [[train_set[1][i]/9.0] for i in range(len(train_set[0]))]

inputs = inputs[:100]
targets = targets[:100]

It would be nice to see what these look like:

In [ ]:
def display_digit(vector):
    for r in range(28):
        for c in range(28):
            v = int(vector[r * 28 + c] * 10)
            ch = " .23456789"[v]
            print(ch, end="")

And we set the inputs and a display_test function to display the digits nicely:

In [ ]:
net.display_test_input = display_digit
net.set_inputs(list(zip(inputs, targets)))

Let's see what they look like, and what the network thinks to start out with:

In [ ]:

Ok, let's train the network:

In [ ]:
net.train(report_rate=10, tolerance=0.05)
In [ ]:

We can test them all:

In [ ]:
for i in range(100):
    output = net.propagate(inputs[i])
    target = int(targets[i][0] * 9)
    print("target:", target, "output:", output, "correct?", int(output * 10) == target)

And get a plot of TSS error, and Percentage correct over time:

In [ ]:
h = net.get_history()
epochs = [x[0] for x in h]
tss = [x[1] for x in h]
percent = [x[2] for x in h]

import matplotlib.pyplot as plt

plt.plot(epochs, tss)
plt.ylabel("TSS Error")
plt.plot(epochs, percent)
plt.ylabel("Percentage Correct")

Try some variations on learning the handwritten characters below.

3. Reflections

As per usual, please reflect deeply on this week's lab. What was challenging, easy, or surprising? Connect the topics onto what you already know.