StanfordMLPython/ex4/.ipynb_checkpoints/Untitled-checkpoint.ipynb

521 lines
89 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<h1> Programming Exercise 4:\n",
" Neural Networks Learning</h1>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<h3>Introduction</h3>\n",
"In this exercise we will implement the backpropagation algorithm for neural networks and apply it to the task of hand-written digit recognition.\n",
"\n",
"<h4>Files included in this exercise</h4>\n",
"- ex4data1.mat - Training set of hand-written digits\n",
"- ex4weights.mat - Neural network parameters"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<h3>1 Neural Networks</h3>\n",
"In the previous exercise, we implemented feedforward propogation for neural networks and used it to predict handwritten digits with given weights. Here we will implement backpropagation to learn the parameters ourselves. We begin by bringing in some useful functions from our previous exercise."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"# used for manipulating directory paths\n",
"import os\n",
"\n",
"# Scientific and vector computation for python\n",
"import numpy as np\n",
"\n",
"# Plotting library\n",
"from matplotlib import pyplot as plt\n",
"\n",
"# Optimization module in scipy\n",
"from scipy import optimize\n",
"\n",
"# will be used to load MATLAB mat datafile format\n",
"from scipy.io import loadmat\n",
"\n",
"# tells matplotlib to embed plots within the notebook\n",
"%matplotlib inline"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"def displayData(X, example_width=None, figsize=(10, 10)):\n",
" \"\"\"\n",
" Displays 2D data stored in X in a nice grid.\n",
" \"\"\"\n",
" # Compute rows, cols\n",
" if X.ndim == 2:\n",
" m, n = X.shape\n",
" elif X.ndim == 1:\n",
" n = X.size\n",
" m = 1\n",
" X = X[None] # Promote to a 2 dimensional array\n",
" else:\n",
" raise IndexError('Input X should be 1 or 2 dimensional.')\n",
"\n",
" example_width = example_width or int(np.round(np.sqrt(n)))\n",
" example_height = n / example_width\n",
"\n",
" # Compute number of items to display\n",
" display_rows = int(np.floor(np.sqrt(m)))\n",
" display_cols = int(np.ceil(m / display_rows))\n",
"\n",
" fig, ax_array = plt.subplots(display_rows, display_cols, figsize=figsize)\n",
" fig.subplots_adjust(wspace=0.025, hspace=0.025)\n",
"\n",
" ax_array = [ax_array] if m == 1 else ax_array.ravel()\n",
"\n",
" for i, ax in enumerate(ax_array):\n",
" ax.imshow(X[i].reshape(example_width, example_width, order='F'),\n",
" cmap='Greys', extent=[0, 1, 0, 1])\n",
" ax.axis('off')\n"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"def sigmoid(z):\n",
" \"\"\"\n",
" Compute sigmoid function given the input z.\n",
" \n",
" Parameters\n",
" ----------\n",
" z : array_like\n",
" The input to the sigmoid function. This can be a 1-D vector \n",
" or a 2-D matrix. \n",
" \n",
" Returns\n",
" -------\n",
" g : array_like\n",
" The computed sigmoid function. g has the same shape as z, since\n",
" the sigmoid is computed element-wise on z.\n",
" \"\"\"\n",
" # convert input to a numpy array\n",
" z = np.array(z)\n",
"\n",
" g = 1 + np.exp(-1*z)\n",
" g = np.reciprocal(g)\n",
"\n",
" return g"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"def predict(Theta1, Theta2, X):\n",
" \"\"\"\n",
" Predict the label of an input given a trained neural network\n",
" Outputs the predicted label of X given the trained weights of a neural\n",
" network(Theta1, Theta2)\n",
" \"\"\"\n",
" # Useful values\n",
" m = X.shape[0]\n",
" num_labels = Theta2.shape[0]\n",
"\n",
" # You need to return the following variables correctly\n",
" p = np.zeros(m)\n",
" h1 = sigmoid(np.dot(np.concatenate([np.ones((m, 1)), X], axis=1), Theta1.T))\n",
" h2 = sigmoid(np.dot(np.concatenate([np.ones((m, 1)), h1], axis=1), Theta2.T))\n",
" p = np.argmax(h2, axis=1)\n",
" return p"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we can visualize our data using our old function displayData"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"# training data stored in arrays X, y\n",
"data = loadmat(os.path.join('Data', 'ex4data1.mat'))\n",
"X, y = data['X'], data['y'].ravel()\n",
"\n",
"# set the zero digit to 0, rather than its mapped 10 in this dataset\n",
"# This is an artifact due to the fact that this dataset was used in \n",
"# MATLAB where there is no index 0\n",
"y[y == 10] = 0\n",
"\n",
"# Number of training examples\n",
"m = y.size"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 720x720 with 100 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"# Randomly select 100 data points to display\n",
"rand_indices = np.random.choice(m, 100, replace=False)\n",
"sel = X[rand_indices, :]\n",
"\n",
"displayData(sel)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note that, as before, our data consists of 5000 samples of 20 by 20 pixel grayscale images of a digit. Each pixel is represented by a floating point number indicating grayscale value, and the grid of pixels is \"unrolled\" into a 400-dimensional vector. Each training sample is a row in our data matrix X, leaving us with the 5000 by 400 matrix X. We also have a 5000 dimensional vector y consiting of labels for the training set. The following figure provides a representation of our neural network model.\n",
"\n",
"![](Figures/neural_network.png)\n",
"\n",
"It has 3 layers - an input layer, a hidden layer and an output layer. Recall that our inputs are pixel values\n",
"of digit images. Since the images are of size $20 \\times 20$, this gives us 400 input layer units (not counting the extra bias unit which always outputs +1). The training data was loaded into the variables `X` and `y` above."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"# Setup the parameters you will use for this exercise\n",
"input_layer_size = 400 # 20x20 Input Images of Digits\n",
"hidden_layer_size = 25 # 25 hidden units\n",
"num_labels = 10 # 10 labels, from 0 to 9\n",
"\n",
"# Load the weights into variables Theta1 and Theta2\n",
"weights = loadmat(os.path.join('Data', 'ex4weights.mat'))\n",
"\n",
"# Theta1 has size 25 x 401\n",
"# Theta2 has size 10 x 26\n",
"Theta1, Theta2 = weights['Theta1'], weights['Theta2']\n",
"\n",
"# swap first and last columns of Theta2, due to legacy from MATLAB indexing, \n",
"# since the weight file ex3weights.mat was saved based on MATLAB indexing\n",
"Theta2 = np.roll(Theta2, 1, axis=0)\n",
"\n",
"# Unroll parameters \n",
"nn_params = np.concatenate([Theta1.ravel(), Theta2.ravel()])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We will now implement our neural network's cost function to return the cost. Recall that our regularized cost function is represented by $$ J(\\theta) = \\frac{1}{m} \\sum_{i=1}^{m}\\sum_{k=1}^{K} \\left[ - y_k^{(i)} \\log \\left( \\left( h_\\theta \\left( x^{(i)} \\right) \\right)_k \\right) - \\left( 1 - y_k^{(i)} \\right) \\log \\left( 1 - \\left( h_\\theta \\left( x^{(i)} \\right) \\right)_k \\right) \\right] + \\frac{\\lambda}{2 m} \\left[ \\sum_{j=1}^{25} \\sum_{k=1}^{400} \\left( \\Theta_{j,k}^{(1)} \\right)^2 + \\sum_{j=1}^{10} \\sum_{k=1}^{25} \\left( \\Theta_{j,k}^{(2)} \\right)^2 \\right] $$\n",
"\n",
"and our regularized gradient as $$ \\begin{align} \n",
"& \\frac{\\partial}{\\partial \\Theta_{ij}^{(l)}} J(\\Theta) = D_{ij}^{(l)} = \\frac{1}{m} \\Delta_{ij}^{(l)} & \\qquad \\text{for } j = 0 \\\\\n",
"& \\frac{\\partial}{\\partial \\Theta_{ij}^{(l)}} J(\\Theta) = D_{ij}^{(l)} = \\frac{1}{m} \\Delta_{ij}^{(l)} + \\frac{\\lambda}{m} \\Theta_{ij}^{(l)} & \\qquad \\text{for } j \\ge 1\n",
"\\end{align}\n",
"$$\n",
"\n",
"Note that we will *not* be regularizing the first column of $\\Theta^{(l)}$ which is used for the bias term. Furthermore, in the parameters $\\Theta_{ij}^{(l)}$, $i$ is indexed starting from 1, and $j$ is indexed starting from 0. Thus, \n",
"\n",
"$$\n",
"\\Theta^{(l)} = \\begin{bmatrix}\n",
"\\Theta_{1,0}^{(i)} & \\Theta_{1,1}^{(l)} & \\cdots \\\\\n",
"\\Theta_{2,0}^{(i)} & \\Theta_{2,1}^{(l)} & \\cdots \\\\\n",
"\\vdots & ~ & \\ddots\n",
"\\end{bmatrix}\n",
"$$\n",
"\n",
"Note that for this cost function we will need the sigmoid gradient function as well as a function to randomly initialize theta, since a zero initialization would not lead to a helpful solution."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"def sigmoidGradient(z):\n",
" \"\"\"\n",
" Computes the gradient of the sigmoid function evaluated at z. \n",
" This should work regardless if z is a matrix or a vector. \n",
" In particular, if z is a vector or matrix, you should return\n",
" the gradient for each element.\n",
" \n",
" Parameters\n",
" ----------\n",
" z : array_like\n",
" A vector or matrix as input to the sigmoid function. \n",
" \n",
" Returns\n",
" --------\n",
" g : array_like\n",
" Gradient of the sigmoid function. Has the same shape as z. \n",
" \n",
" Instructions\n",
" ------------\n",
" Compute the gradient of the sigmoid function evaluated at\n",
" each value of z (z can be a matrix, vector or scalar).\n",
" \n",
" Note\n",
" ----\n",
" We have provided an implementation of the sigmoid function \n",
" in `utils.py` file accompanying this assignment.\n",
" \"\"\"\n",
"\n",
" g = np.zeros(z.shape)\n",
"\n",
" # ====================== YOUR CODE HERE ======================\n",
"\n",
" g = np.multiply(sigmoid(z), (1-sigmoid(z)))\n",
"\n",
" # =============================================================\n",
" return g"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"def randInitializeWeights(L_in, L_out, epsilon_init=0.12):\n",
" \"\"\"\n",
" Randomly initialize the weights of a layer in a neural network.\n",
" \n",
" Parameters\n",
" ----------\n",
" L_in : int\n",
" Number of incomming connections.\n",
" \n",
" L_out : int\n",
" Number of outgoing connections. \n",
" \n",
" epsilon_init : float, optional\n",
" Range of values which the weight can take from a uniform \n",
" distribution.\n",
" \n",
" Returns\n",
" -------\n",
" W : array_like\n",
" The weight initialiatized to random values. Note that W should\n",
" be set to a matrix of size(L_out, 1 + L_in) as\n",
" the first column of W handles the \"bias\" terms.\n",
" \"\"\"\n",
" epsilon_init = 0.12\n",
" W = np.random.rand(L_out, 1 + L_in) * 2 * epsilon_init - epsilon_init\n",
"\n",
" return W"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"def nnCostFunction(nn_params,\n",
" input_layer_size,\n",
" hidden_layer_size,\n",
" num_labels,\n",
" X, y, lambda_=0.0):\n",
" \"\"\"\n",
" Implements the neural network cost function and gradient for a two layer neural \n",
" network which performs classification. \n",
" \n",
" Parameters\n",
" ----------\n",
" nn_params : array_like\n",
" The parameters for the neural network which are \"unrolled\" into \n",
" a vector. This needs to be converted back into the weight matrices Theta1\n",
" and Theta2.\n",
" \n",
" input_layer_size : int\n",
" Number of features for the input layer. \n",
" \n",
" hidden_layer_size : int\n",
" Number of hidden units in the second layer.\n",
" \n",
" num_labels : int\n",
" Total number of labels, or equivalently number of units in output layer. \n",
" \n",
" X : array_like\n",
" Input dataset. A matrix of shape (m x input_layer_size).\n",
" \n",
" y : array_like\n",
" Dataset labels. A vector of shape (m,).\n",
" \n",
" lambda_ : float, optional\n",
" Regularization parameter.\n",
" \n",
" Returns\n",
" -------\n",
" J : float\n",
" The computed value for the cost function at the current weight values.\n",
" \n",
" grad : array_like\n",
" An \"unrolled\" vector of the partial derivatives of the concatenatation of\n",
" neural network weights Theta1 and Theta2.\n",
" \"\"\"\n",
" # Reshape nn_params back into the parameters Theta1 and Theta2, the weight matrices\n",
" # for our 2 layer neural network\n",
" Theta1 = np.reshape(nn_params[:hidden_layer_size * (input_layer_size + 1)],\n",
" (hidden_layer_size, (input_layer_size + 1)))\n",
"\n",
" Theta2 = np.reshape(nn_params[(hidden_layer_size * (input_layer_size + 1)):],\n",
" (num_labels, (hidden_layer_size + 1)))\n",
"\n",
" # Setup some useful variables\n",
" m = y.size\n",
" K = num_labels\n",
" J = 0\n",
" Theta1_grad = np.zeros(Theta1.shape)\n",
" Theta2_grad = np.zeros(Theta2.shape)\n",
"\n",
" # Forward Propogation\n",
" y_mat = np.identity(num_labels)[y,:]\n",
" a1 = np.concatenate([np.ones((m, 1)), X], axis=1) # Add collumn of ones to X\n",
" z2 = Theta1.dot(a1.transpose())\n",
" z2 = z2.transpose()\n",
" a2 = sigmoid(z2)\n",
" a2 = np.concatenate([np.ones((a2.shape[0], 1)), a2], axis=1)\n",
" z3 = Theta2.dot(a2.transpose())\n",
" a3 = sigmoid(z3)\n",
" a3 = a3.transpose()\n",
" p = np.argmax(a3, axis=1)\n",
" \n",
" # Unregularized cost function\n",
" log_h = np.log(a3)\n",
" prod1 = np.multiply(y_mat, log_h)\n",
" prod2 = np.multiply((1-y_mat), np.log(1-a3))\n",
" for i in range(m):\n",
" for k in range(K):\n",
" J = J + prod1[i,k]\n",
" J = J + prod2[i,k]\n",
" J = -(J/m)\n",
" temp = 0\n",
" \n",
" # Regularization term\n",
" for i in range(Theta1.shape[0]):\n",
" for j in range(1,Theta1.shape[1]):\n",
" temp = temp + (Theta1[i,j])**2\n",
" temp = temp * (lambda_/(2*m))\n",
" J = J + temp\n",
" temp = 0\n",
" for i in range(Theta2.shape[0]):\n",
" for j in range(1,Theta2.shape[1]):\n",
" temp = temp + (Theta2[i,j])**2\n",
" temp = temp * (lambda_/(2*m))\n",
" J = J + temp\n",
" \n",
" # Backpropagation\n",
" d3 = a3 - y_mat\n",
" d2 = np.multiply((d3.dot(Theta2[:,1:])), sigmoidGradient(z2))\n",
" Delta1 = d2.transpose().dot(a1)\n",
" Delta2 = d3.transpose().dot(a2)\n",
" Theta1_grad = Delta1/m\n",
" Theta2_grad = Delta2/m\n",
" \n",
" # Regularized Backpropagation\n",
" Theta1[:,0] = 0\n",
" Theta2[:,0] = 0\n",
" Theta1 = (lambda_/m)*Theta1\n",
" Theta2 = (lambda_/m)*Theta2\n",
" Theta1_grad = Theta1_grad + Theta1\n",
" Theta2_grad = Theta2_grad + Theta2\n",
" \n",
" # ================================================================\n",
" # Unroll gradients\n",
" # grad = np.concatenate([Theta1_grad.ravel(order=order), Theta2_grad.ravel(order=order)])\n",
" grad = np.concatenate([Theta1_grad.ravel(), Theta2_grad.ravel()])\n",
"\n",
" return J, grad"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can now initialize lambda and check our cost function"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Cost at parameters (loaded from ex4weights): 0.383770\n",
"This value should be about : 0.383770.\n"
]
}
],
"source": [
"# Weight regularization parameter (we set this to 1 here).\n",
"lambda_ = 1\n",
"J, _ = nnCostFunction(nn_params, input_layer_size, hidden_layer_size,\n",
" num_labels, X, y, lambda_)\n",
"\n",
"print('Cost at parameters (loaded from ex4weights): %.6f' % J)\n",
"print('This value should be about : 0.383770.')"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}