846 lines
152 KiB
Plaintext
846 lines
152 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"<h1>Programming Exercise 5:\n",
|
|
" Regularized Linear Regression and Bias vs. Variance</h1>\n",
|
|
" \n",
|
|
"<h3> Introduction </h3>\n",
|
|
"In this exercise, we will implement regularized linear regression and use it to study models with different bias-variance properties. To start, we will import necessary modules, implement some useful functions from previous exercises, and load our data.\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 1,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# used for manipulating directory paths\n",
|
|
"import os\n",
|
|
"\n",
|
|
"# Scientific and vector computation for python\n",
|
|
"import numpy as np\n",
|
|
"\n",
|
|
"# Plotting library\n",
|
|
"from matplotlib import pyplot as plt\n",
|
|
"\n",
|
|
"# Optimization module in scipy\n",
|
|
"from scipy import optimize\n",
|
|
"\n",
|
|
"# will be used to load MATLAB mat datafile format\n",
|
|
"from scipy.io import loadmat\n",
|
|
"\n",
|
|
"# tells matplotlib to embed plots within the notebook\n",
|
|
"%matplotlib inline"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 2,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"def trainLinearReg(linearRegCostFunction, X, y, lambda_=0.0, maxiter=200):\n",
|
|
" \"\"\"\n",
|
|
" Trains linear regression using scipy's optimize.minimize.\n",
|
|
"\n",
|
|
" Parameters\n",
|
|
" ----------\n",
|
|
" X : array_like\n",
|
|
" The dataset with shape (m x n+1). The bias term is assumed to be concatenated.\n",
|
|
"\n",
|
|
" y : array_like\n",
|
|
" Function values at each datapoint. A vector of shape (m,).\n",
|
|
"\n",
|
|
" lambda_ : float, optional\n",
|
|
" The regularization parameter.\n",
|
|
"\n",
|
|
" maxiter : int, optional\n",
|
|
" Maximum number of iteration for the optimization algorithm.\n",
|
|
"\n",
|
|
" Returns\n",
|
|
" -------\n",
|
|
" theta : array_like\n",
|
|
" The parameters for linear regression. This is a vector of shape (n+1,).\n",
|
|
" \"\"\"\n",
|
|
" # Initialize Theta\n",
|
|
" initial_theta = np.zeros(X.shape[1])\n",
|
|
"\n",
|
|
" # Create \"short hand\" for the cost function to be minimized\n",
|
|
" costFunction = lambda t: linearRegCostFunction(X, y, t, lambda_)\n",
|
|
"\n",
|
|
" # Now, costFunction is a function that takes in only one argument\n",
|
|
" options = {'maxiter': maxiter}\n",
|
|
"\n",
|
|
" # Minimize using scipy\n",
|
|
" res = optimize.minimize(costFunction, initial_theta, jac=True, method='TNC', options=options)\n",
|
|
" return res.x"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 3,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"def featureNormalize(X):\n",
|
|
" \"\"\"\n",
|
|
" Normalizes the features in X returns a normalized version of X where the mean value of each\n",
|
|
" feature is 0 and the standard deviation is 1. This is often a good preprocessing step to do when\n",
|
|
" working with learning algorithms.\n",
|
|
"\n",
|
|
" Parameters\n",
|
|
" ----------\n",
|
|
" X : array_like\n",
|
|
" An dataset which is a (m x n) matrix, where m is the number of examples,\n",
|
|
" and n is the number of dimensions for each example.\n",
|
|
"\n",
|
|
" Returns\n",
|
|
" -------\n",
|
|
" X_norm : array_like\n",
|
|
" The normalized input dataset.\n",
|
|
"\n",
|
|
" mu : array_like\n",
|
|
" A vector of size n corresponding to the mean for each dimension across all examples.\n",
|
|
"\n",
|
|
" sigma : array_like\n",
|
|
" A vector of size n corresponding to the standard deviations for each dimension across\n",
|
|
" all examples.\n",
|
|
" \"\"\"\n",
|
|
" mu = np.mean(X, axis=0)\n",
|
|
" X_norm = X - mu\n",
|
|
"\n",
|
|
" sigma = np.std(X_norm, axis=0, ddof=1)\n",
|
|
" X_norm /= sigma\n",
|
|
" return X_norm, mu, sigma"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 7,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"def plotFit(polyFeatures, min_x, max_x, mu, sigma, theta, p):\n",
|
|
" \"\"\"\n",
|
|
" Plots a learned polynomial regression fit over an existing figure.\n",
|
|
" Also works with linear regression.\n",
|
|
" Plots the learned polynomial fit with power p and feature normalization (mu, sigma).\n",
|
|
"\n",
|
|
" Parameters\n",
|
|
" ----------\n",
|
|
" polyFeatures : func\n",
|
|
" A function which generators polynomial features from a single feature.\n",
|
|
"\n",
|
|
" min_x : float\n",
|
|
" The minimum value for the feature.\n",
|
|
"\n",
|
|
" max_x : float\n",
|
|
" The maximum value for the feature.\n",
|
|
"\n",
|
|
" mu : float\n",
|
|
" The mean feature value over the training dataset.\n",
|
|
"\n",
|
|
" sigma : float\n",
|
|
" The feature standard deviation of the training dataset.\n",
|
|
"\n",
|
|
" theta : array_like\n",
|
|
" The parameters for the trained polynomial linear regression.\n",
|
|
"\n",
|
|
" p : int\n",
|
|
" The polynomial order.\n",
|
|
" \"\"\"\n",
|
|
" # We plot a range slightly bigger than the min and max values to get\n",
|
|
" # an idea of how the fit will vary outside the range of the data points\n",
|
|
" x = np.arange(min_x - 15, max_x + 25, 0.05).reshape(-1, 1)\n",
|
|
"\n",
|
|
" # Map the X values\n",
|
|
" X_poly = polyFeatures(x, p)\n",
|
|
" X_poly -= mu\n",
|
|
" X_poly /= sigma\n",
|
|
"\n",
|
|
" # Add ones\n",
|
|
" X_poly = np.concatenate([np.ones((x.shape[0], 1)), X_poly], axis=1)\n",
|
|
"\n",
|
|
" # Plot\n",
|
|
" plt.plot(x, np.dot(X_poly, theta), '--', lw=2)\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"<h3>1 Regularized Linear Regression</h3>\n",
|
|
"In the first half of this exercize, we will implement regularized linear regression to predict the amount of water flowing out of a dam using the change of water level in a reservoir. We begin by visualizing the dataset which is split into a training set (X,y), a cross validation set (Xval, yval), and a test set (Xtest, ytest)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 12,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"image/png": "\n",
|
|
"text/plain": [
|
|
"<Figure size 432x288 with 1 Axes>"
|
|
]
|
|
},
|
|
"metadata": {
|
|
"needs_background": "light"
|
|
},
|
|
"output_type": "display_data"
|
|
}
|
|
],
|
|
"source": [
|
|
"# Load from ex5data1.mat, where all variables will be store in a dictionary\n",
|
|
"data = loadmat(os.path.join('Data', 'ex5data1.mat'))\n",
|
|
"\n",
|
|
"# Extract train, test, validation data from dictionary\n",
|
|
"# and also convert y's form 2-D matrix (MATLAB format) to a numpy vector\n",
|
|
"X, y = data['X'], data['y'][:, 0]\n",
|
|
"Xtest, ytest = data['Xtest'], data['ytest'][:, 0]\n",
|
|
"Xval, yval = data['Xval'], data['yval'][:, 0]\n",
|
|
"\n",
|
|
"# m = Number of examples\n",
|
|
"m = y.size\n",
|
|
"\n",
|
|
"# Plot training data\n",
|
|
"plt.plot(X, y, 'ro', ms=10, mec='k', mew=1)\n",
|
|
"plt.xlabel('Change in water level (x)')\n",
|
|
"plt.ylabel('Water flowing out of the dam (y)');"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Next, we implement a regularized linear regression cost function."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 12,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"def linearRegCostFunction(X, y, theta, lambda_=0.0):\n",
|
|
" \"\"\"\n",
|
|
" Compute cost and gradient for regularized linear regression \n",
|
|
" with multiple variables. Computes the cost of using theta as\n",
|
|
" the parameter for linear regression to fit the data points in X and y. \n",
|
|
" \n",
|
|
" Parameters\n",
|
|
" ----------\n",
|
|
" X : array_like\n",
|
|
" The dataset. Matrix with shape (m x n + 1) where m is the \n",
|
|
" total number of examples, and n is the number of features \n",
|
|
" before adding the bias term.\n",
|
|
" \n",
|
|
" y : array_like\n",
|
|
" The functions values at each datapoint. A vector of\n",
|
|
" shape (m, ).\n",
|
|
" \n",
|
|
" theta : array_like\n",
|
|
" The parameters for linear regression. A vector of shape (n+1,).\n",
|
|
" \n",
|
|
" lambda_ : float, optional\n",
|
|
" The regularization parameter.\n",
|
|
" \n",
|
|
" Returns\n",
|
|
" -------\n",
|
|
" J : float\n",
|
|
" The computed cost function. \n",
|
|
" \n",
|
|
" grad : array_like\n",
|
|
" The value of the cost function gradient w.r.t theta. \n",
|
|
" A vector of shape (n+1, ).\n",
|
|
" \"\"\"\n",
|
|
" # Initialize some useful values\n",
|
|
" m = y.size # number of training examples\n",
|
|
" J = 0\n",
|
|
" grad = np.zeros(theta.shape)\n",
|
|
"\n",
|
|
" h = X.dot(theta)\n",
|
|
" J = h-y\n",
|
|
" J = np.square(J)\n",
|
|
" J = np.sum(J)\n",
|
|
" J = J / (2*m)\n",
|
|
" tempTheta = theta[0]\n",
|
|
" theta[0] = 0\n",
|
|
" J += (lambda_/(2*m))*np.sum(np.sum(np.square(theta)))\n",
|
|
" theta[0] = tempTheta\n",
|
|
" \n",
|
|
" grad = (1/m)*X.transpose().dot(h-y)\n",
|
|
" grad[1:] += (lambda_/m)*theta[1:]\n",
|
|
" \n",
|
|
" return J, grad"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 17,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"Cost at theta = [1, 1]:\t 303.993192 \n",
|
|
"Gradient at theta = [1, 1]: [-15.303016, 598.250744] \n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"# Test case for cost function\n",
|
|
"\n",
|
|
"theta = np.array([1, 1])\n",
|
|
"J, grad = linearRegCostFunction(np.concatenate([np.ones((m, 1)), X], axis=1), y, theta, 1)\n",
|
|
"\n",
|
|
"print('Cost at theta = [1, 1]:\\t %f ' % J)\n",
|
|
"print('Gradient at theta = [1, 1]: [{:.6f}, {:.6f}] '.format(*grad))"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Next we run train our linear regression model using this cost function and graph the resulting line of best fit."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 19,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"image/png": "\n",
|
|
"text/plain": [
|
|
"<Figure size 432x288 with 1 Axes>"
|
|
]
|
|
},
|
|
"metadata": {
|
|
"needs_background": "light"
|
|
},
|
|
"output_type": "display_data"
|
|
}
|
|
],
|
|
"source": [
|
|
"# add a columns of ones for the y-intercept\n",
|
|
"X_aug = np.concatenate([np.ones((m, 1)), X], axis=1)\n",
|
|
"theta = trainLinearReg(linearRegCostFunction, X_aug, y, lambda_=0)\n",
|
|
"\n",
|
|
"# Plot fit over the data\n",
|
|
"plt.plot(X, y, 'ro', ms=10, mec='k', mew=1.5)\n",
|
|
"plt.xlabel('Change in water level (x)')\n",
|
|
"plt.ylabel('Water flowing out of the dam (y)')\n",
|
|
"plt.plot(X, np.dot(X_aug, theta), '--', lw=2);"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"<h3> 2 Bias-Variance</h3>\n",
|
|
"An important concept in machine learning is the bias-variance tradeoff. High bias models are not complex enough for the data and tend to underfit, while high variance models over fit the training data.\n",
|
|
"\n",
|
|
"In this portion of the exercise we attempt to diagnose bias-variance problems by plotting training and test errors on a learning curve. \n",
|
|
"\n",
|
|
"We begin by creating a function to return a vector of errors for the training and cross validation set, then plotting it on a graph."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 22,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"def learningCurve(X, y, Xval, yval, lambda_=0):\n",
|
|
" \"\"\"\n",
|
|
" Generates the train and cross validation set errors needed to plot a learning curve\n",
|
|
" returns the train and cross validation set errors for a learning curve. \n",
|
|
" \n",
|
|
" Parameters\n",
|
|
" ----------\n",
|
|
" X : array_like\n",
|
|
" The training dataset. Matrix with shape (m x n + 1) where m is the \n",
|
|
" total number of examples, and n is the number of features \n",
|
|
" before adding the bias term.\n",
|
|
" \n",
|
|
" y : array_like\n",
|
|
" The functions values at each training datapoint. A vector of\n",
|
|
" shape (m, ).\n",
|
|
" \n",
|
|
" Xval : array_like\n",
|
|
" The validation dataset. Matrix with shape (m_val x n + 1) where m is the \n",
|
|
" total number of examples, and n is the number of features \n",
|
|
" before adding the bias term.\n",
|
|
" \n",
|
|
" yval : array_like\n",
|
|
" The functions values at each validation datapoint. A vector of\n",
|
|
" shape (m_val, ).\n",
|
|
" \n",
|
|
" lambda_ : float, optional\n",
|
|
" The regularization parameter.\n",
|
|
" \n",
|
|
" Returns\n",
|
|
" -------\n",
|
|
" error_train : array_like\n",
|
|
" A vector of shape m. error_train[i] contains the training error for\n",
|
|
" i examples.\n",
|
|
" error_val : array_like\n",
|
|
" A vecotr of shape m. error_val[i] contains the validation error for\n",
|
|
" i training examples.\n",
|
|
" \"\"\"\n",
|
|
" # Number of training examples\n",
|
|
" m = y.size\n",
|
|
"\n",
|
|
" # You need to return these values correctly\n",
|
|
" error_train = np.zeros(m)\n",
|
|
" error_val = np.zeros(m)\n",
|
|
"\n",
|
|
" # ====================== YOUR CODE HERE ======================\n",
|
|
" \n",
|
|
" for i in range(1, m+1):\n",
|
|
" X_train = X[:i, :]\n",
|
|
" y_train = y[:i]\n",
|
|
" Theta = trainLinearReg(linearRegCostFunction, X_train, y_train, lambda_=0.0, maxiter=200)\n",
|
|
" error_train[i-1] = linearRegCostFunction(X_train,y_train,Theta,0)[0];\n",
|
|
" error_val[i-1] = linearRegCostFunction(Xval,yval,Theta,0)[0];\n",
|
|
" \n",
|
|
" # =============================================================\n",
|
|
" return error_train, error_val"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 23,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"# Training Examples\tTrain Error\tCross Validation Error\n",
|
|
" \t1\t\t0.000000\t205.121096\n",
|
|
" \t2\t\t0.000000\t110.302641\n",
|
|
" \t3\t\t3.286595\t45.010231\n",
|
|
" \t4\t\t2.842678\t48.368911\n",
|
|
" \t5\t\t13.154049\t35.865165\n",
|
|
" \t6\t\t19.443963\t33.829962\n",
|
|
" \t7\t\t20.098522\t31.970986\n",
|
|
" \t8\t\t18.172859\t30.862446\n",
|
|
" \t9\t\t22.609405\t31.135998\n",
|
|
" \t10\t\t23.261462\t28.936207\n",
|
|
" \t11\t\t24.317250\t29.551432\n",
|
|
" \t12\t\t22.373906\t29.433818\n"
|
|
]
|
|
},
|
|
{
|
|
"data": {
|
|
"image/png": "\n",
|
|
"text/plain": [
|
|
"<Figure size 432x288 with 1 Axes>"
|
|
]
|
|
},
|
|
"metadata": {
|
|
"needs_background": "light"
|
|
},
|
|
"output_type": "display_data"
|
|
}
|
|
],
|
|
"source": [
|
|
"X_aug = np.concatenate([np.ones((m, 1)), X], axis=1)\n",
|
|
"Xval_aug = np.concatenate([np.ones((yval.size, 1)), Xval], axis=1)\n",
|
|
"error_train, error_val = learningCurve(X_aug, y, Xval_aug, yval, lambda_=0)\n",
|
|
"\n",
|
|
"plt.plot(np.arange(1, m+1), error_train, np.arange(1, m+1), error_val, lw=2)\n",
|
|
"plt.title('Learning curve for linear regression')\n",
|
|
"plt.legend(['Train', 'Cross Validation'])\n",
|
|
"plt.xlabel('Number of training examples')\n",
|
|
"plt.ylabel('Error')\n",
|
|
"plt.axis([0, 13, 0, 150])\n",
|
|
"\n",
|
|
"print('# Training Examples\\tTrain Error\\tCross Validation Error')\n",
|
|
"for i in range(m):\n",
|
|
" print(' \\t%d\\t\\t%f\\t%f' % (i+1, error_train[i], error_val[i]))"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Looking at the resulting figure, we can see that both the taining and cross validation errors are high when the number of training examples is increase (specifically the training error increases to math cross validation). This reflects a problem of high bias in our model. That is to say, our model is too simple and unable to fit our data set well. "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"<h3>3 Polynomial Regression</h3>\n",
|
|
"The problem with our model was that it was too simple for the data and resulted in underfitting (high bias). In this portion of the exercise, we will address this problem by adding more features to produce a more complex fit to the data. We begin by creating a function to map the original training set into its higher powers."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 24,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"def polyFeatures(X, p):\n",
|
|
" \"\"\"\n",
|
|
" Maps X (1D vector) into the p-th power.\n",
|
|
" \n",
|
|
" Parameters\n",
|
|
" ----------\n",
|
|
" X : array_like\n",
|
|
" A data vector of size m, where m is the number of examples.\n",
|
|
" \n",
|
|
" p : int\n",
|
|
" The polynomial power to map the features. \n",
|
|
" \n",
|
|
" Returns \n",
|
|
" -------\n",
|
|
" X_poly : array_like\n",
|
|
" A matrix of shape (m x p) where p is the polynomial \n",
|
|
" power and m is the number of examples. That is:\n",
|
|
" \n",
|
|
" X_poly[i, :] = [X[i], X[i]**2, X[i]**3 ... X[i]**p]\n",
|
|
" \"\"\"\n",
|
|
" X_poly = np.zeros((X.shape[0], p))\n",
|
|
" X_poly[:,0] = X[:,0]\n",
|
|
" for i in range(1,p):\n",
|
|
" X_poly[:,i] = np.power(X.transpose(), i+1)\n",
|
|
"\n",
|
|
" return X_poly"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"We now apply this function to our training set, test set, and cross validation set."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 27,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"p = 8\n",
|
|
"\n",
|
|
"# Map X onto Polynomial Features and Normalize\n",
|
|
"X_poly = polyFeatures(X, p)\n",
|
|
"X_poly, mu, sigma = featureNormalize(X_poly)\n",
|
|
"X_poly = np.concatenate([np.ones((m, 1)), X_poly], axis=1)\n",
|
|
"\n",
|
|
"# Map X_poly_test and normalize (using mu and sigma)\n",
|
|
"X_poly_test = polyFeatures(Xtest, p)\n",
|
|
"X_poly_test -= mu\n",
|
|
"X_poly_test /= sigma\n",
|
|
"X_poly_test = np.concatenate([np.ones((ytest.size, 1)), X_poly_test], axis=1)\n",
|
|
"\n",
|
|
"# Map X_poly_val and normalize (using mu and sigma)\n",
|
|
"X_poly_val = polyFeatures(Xval, p)\n",
|
|
"X_poly_val -= mu\n",
|
|
"X_poly_val /= sigma\n",
|
|
"X_poly_val = np.concatenate([np.ones((yval.size, 1)), X_poly_val], axis=1)\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Now that we have the ability to map polynomial features, we can train our model via linear regression and plot to see how it fits our data. We will also plot a learning curve for lambda = 0 to see if we still have a bias/variance problem."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 34,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"Polynomial Regression (lambda = 0.000000)\n",
|
|
"\n",
|
|
"# Training Examples\tTrain Error\tCross Validation Error\n",
|
|
" \t1\t\t0.000000\t160.721900\n",
|
|
" \t2\t\t0.000000\t160.121511\n",
|
|
" \t3\t\t0.000000\t59.071634\n",
|
|
" \t4\t\t0.000000\t77.997728\n",
|
|
" \t5\t\t0.000000\t6.448961\n",
|
|
" \t6\t\t0.000000\t10.831639\n",
|
|
" \t7\t\t0.000000\t27.916727\n",
|
|
" \t8\t\t0.000064\t21.128258\n",
|
|
" \t9\t\t0.000147\t30.474290\n",
|
|
" \t10\t\t0.021425\t50.335502\n",
|
|
" \t11\t\t0.032329\t55.153697\n",
|
|
" \t12\t\t0.036300\t37.781163\n"
|
|
]
|
|
},
|
|
{
|
|
"data": {
|
|
"image/png": "\n",
|
|
"text/plain": [
|
|
"<Figure size 432x288 with 1 Axes>"
|
|
]
|
|
},
|
|
"metadata": {
|
|
"needs_background": "light"
|
|
},
|
|
"output_type": "display_data"
|
|
},
|
|
{
|
|
"data": {
|
|
"image/png": "\n",
|
|
"text/plain": [
|
|
"<Figure size 432x288 with 1 Axes>"
|
|
]
|
|
},
|
|
"metadata": {
|
|
"needs_background": "light"
|
|
},
|
|
"output_type": "display_data"
|
|
}
|
|
],
|
|
"source": [
|
|
"lambda_ = 0\n",
|
|
"theta = trainLinearReg(linearRegCostFunction, X_poly, y,\n",
|
|
" lambda_=lambda_, maxiter=55)\n",
|
|
"\n",
|
|
"# Plot training data and fit\n",
|
|
"plt.plot(X, y, 'ro', ms=10, mew=1.5, mec='k')\n",
|
|
"\n",
|
|
"plotFit(polyFeatures, np.min(X), np.max(X), mu, sigma, theta, p)\n",
|
|
"\n",
|
|
"plt.xlabel('Change in water level (x)')\n",
|
|
"plt.ylabel('Water flowing out of the dam (y)')\n",
|
|
"plt.title('Polynomial Regression Fit (lambda = %f)' % lambda_)\n",
|
|
"plt.ylim([-20, 50])\n",
|
|
"\n",
|
|
"plt.figure()\n",
|
|
"error_train, error_val = learningCurve(X_poly, y, X_poly_val, yval, lambda_)\n",
|
|
"plt.plot(np.arange(1, 1+m), error_train, np.arange(1, 1+m), error_val)\n",
|
|
"\n",
|
|
"plt.title('Polynomial Regression Learning Curve (lambda = %f)' % lambda_)\n",
|
|
"plt.xlabel('Number of training examples')\n",
|
|
"plt.ylabel('Error')\n",
|
|
"plt.axis([0, 13, 0, 100])\n",
|
|
"plt.legend(['Train', 'Cross Validation'])\n",
|
|
"\n",
|
|
"print('Polynomial Regression (lambda = %f)\\n' % lambda_)\n",
|
|
"print('# Training Examples\\tTrain Error\\tCross Validation Error')\n",
|
|
"for i in range(m):\n",
|
|
" print(' \\t%d\\t\\t%f\\t%f' % (i+1, error_train[i], error_val[i]))"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Looking at the resulting figures, we can see that our curve fits the data extremely well. In fact, it fits it too well. Along the samples it follows perfectly, however it fails to follow the trend along the extremes. We can also see this in the learning curve, as while the training error is extremely low, the cross validation error (the error we would realistically expect to see) is still high. This imply we now have an issue of high-variance, or overfitting. To address this, we can add a regularization term. In order to choose an effective lambda, we automate the process by testing a sequence of lambdas and choosing the one with the least error."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 36,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"def validationCurve(X, y, Xval, yval):\n",
|
|
" \"\"\"\n",
|
|
" Generate the train and validation errors needed to plot a validation\n",
|
|
" curve that we can use to select lambda_.\n",
|
|
" \n",
|
|
" Parameters\n",
|
|
" ----------\n",
|
|
" X : array_like\n",
|
|
" The training dataset. Matrix with shape (m x n) where m is the \n",
|
|
" total number of training examples, and n is the number of features \n",
|
|
" including any polynomial features.\n",
|
|
" \n",
|
|
" y : array_like\n",
|
|
" The functions values at each training datapoint. A vector of\n",
|
|
" shape (m, ).\n",
|
|
" \n",
|
|
" Xval : array_like\n",
|
|
" The validation dataset. Matrix with shape (m_val x n) where m is the \n",
|
|
" total number of validation examples, and n is the number of features \n",
|
|
" including any polynomial features.\n",
|
|
" \n",
|
|
" yval : array_like\n",
|
|
" The functions values at each validation datapoint. A vector of\n",
|
|
" shape (m_val, ).\n",
|
|
" \n",
|
|
" Returns\n",
|
|
" -------\n",
|
|
" lambda_vec : list\n",
|
|
" The values of the regularization parameters which were used in \n",
|
|
" cross validation.\n",
|
|
" \n",
|
|
" error_train : list\n",
|
|
" The training error computed at each value for the regularization\n",
|
|
" parameter.\n",
|
|
" \n",
|
|
" error_val : list\n",
|
|
" The validation error computed at each value for the regularization\n",
|
|
" parameter.\n",
|
|
" \"\"\"\n",
|
|
" # Selected values of lambda\n",
|
|
" lambda_vec = [0, 0.001, 0.003, 0.01, 0.03, 0.1, 0.3, 1, 3, 10]\n",
|
|
"\n",
|
|
" error_train = np.zeros(len(lambda_vec))\n",
|
|
" error_val = np.zeros(len(lambda_vec))\n",
|
|
"\n",
|
|
" for i in range(len(lambda_vec)):\n",
|
|
" lambda_ = lambda_vec[i]\n",
|
|
" Theta = trainLinearReg(linearRegCostFunction, X, y, lambda_, maxiter=200)\n",
|
|
" error_train[i] = linearRegCostFunction(X,y,Theta,0)[0]\n",
|
|
" error_val[i] = linearRegCostFunction(Xval,yval,Theta,0)[0]\n",
|
|
"\n",
|
|
" return lambda_vec, error_train, error_val"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"We now plot a cross validation curve of error vs lambda which allows us to select which lambda paremeter to use."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 37,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"lambda\t\tTrain Error\tValidation Error\n",
|
|
" 0.000000\t0.036300\t37.781163\n",
|
|
" 0.001000\t0.112707\t9.842030\n",
|
|
" 0.003000\t0.170997\t16.309292\n",
|
|
" 0.010000\t0.221517\t16.944779\n",
|
|
" 0.030000\t0.281841\t12.830156\n",
|
|
" 0.100000\t0.459318\t7.586964\n",
|
|
" 0.300000\t0.921783\t4.636755\n",
|
|
" 1.000000\t2.076199\t4.260602\n",
|
|
" 3.000000\t4.901376\t3.822923\n",
|
|
" 10.000000\t16.092273\t9.945554\n"
|
|
]
|
|
},
|
|
{
|
|
"data": {
|
|
"image/png": "\n",
|
|
"text/plain": [
|
|
"<Figure size 432x288 with 1 Axes>"
|
|
]
|
|
},
|
|
"metadata": {
|
|
"needs_background": "light"
|
|
},
|
|
"output_type": "display_data"
|
|
}
|
|
],
|
|
"source": [
|
|
"lambda_vec, error_train, error_val = validationCurve(X_poly, y, X_poly_val, yval)\n",
|
|
"\n",
|
|
"pyplot.plot(lambda_vec, error_train, '-o', lambda_vec, error_val, '-o', lw=2)\n",
|
|
"pyplot.legend(['Train', 'Cross Validation'])\n",
|
|
"pyplot.xlabel('lambda')\n",
|
|
"pyplot.ylabel('Error')\n",
|
|
"\n",
|
|
"print('lambda\\t\\tTrain Error\\tValidation Error')\n",
|
|
"for i in range(len(lambda_vec)):\n",
|
|
" print(' %f\\t%f\\t%f' % (lambda_vec[i], error_train[i], error_val[i]))"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"With this, we can see the optimal lambda would be around 3"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 40,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"(-20, 50)"
|
|
]
|
|
},
|
|
"execution_count": 40,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
},
|
|
{
|
|
"data": {
|
|
"image/png": "\n",
|
|
"text/plain": [
|
|
"<Figure size 432x288 with 1 Axes>"
|
|
]
|
|
},
|
|
"metadata": {
|
|
"needs_background": "light"
|
|
},
|
|
"output_type": "display_data"
|
|
}
|
|
],
|
|
"source": [
|
|
"lambda_ = 3\n",
|
|
"theta = trainLinearReg(linearRegCostFunction, X_poly, y,\n",
|
|
" lambda_=lambda_, maxiter=55)\n",
|
|
"\n",
|
|
"# Plot training data and fit\n",
|
|
"plt.plot(X, y, 'ro', ms=10, mew=1.5, mec='k')\n",
|
|
"\n",
|
|
"plotFit(polyFeatures, np.min(X), np.max(X), mu, sigma, theta, p)\n",
|
|
"\n",
|
|
"plt.xlabel('Change in water level (x)')\n",
|
|
"plt.ylabel('Water flowing out of the dam (y)')\n",
|
|
"plt.title('Polynomial Regression Fit (lambda = %f)' % lambda_)\n",
|
|
"plt.ylim([-20, 50])\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": []
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python 3",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.7.3"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 2
|
|
}
|