438 lines
12 KiB
Plaintext
438 lines
12 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"**Exercise 12**\n",
|
|
"\n",
|
|
"Implement batch gradient descent from scratch (no SKLearn!)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 1,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import numpy as np\n",
|
|
"import pandas as pd\n",
|
|
"import os\n",
|
|
"from matplotlib import pyplot as plt\n",
|
|
"from sklearn import datasets\n",
|
|
"\n",
|
|
"%matplotlib inline"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 2,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"['data', 'target', 'target_names', 'DESCR', 'feature_names', 'filename']"
|
|
]
|
|
},
|
|
"execution_count": 2,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"iris = datasets.load_iris()\n",
|
|
"list(iris.keys())"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 3,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
".. _iris_dataset:\n",
|
|
"\n",
|
|
"Iris plants dataset\n",
|
|
"--------------------\n",
|
|
"\n",
|
|
"**Data Set Characteristics:**\n",
|
|
"\n",
|
|
" :Number of Instances: 150 (50 in each of three classes)\n",
|
|
" :Number of Attributes: 4 numeric, predictive attributes and the class\n",
|
|
" :Attribute Information:\n",
|
|
" - sepal length in cm\n",
|
|
" - sepal width in cm\n",
|
|
" - petal length in cm\n",
|
|
" - petal width in cm\n",
|
|
" - class:\n",
|
|
" - Iris-Setosa\n",
|
|
" - Iris-Versicolour\n",
|
|
" - Iris-Virginica\n",
|
|
" \n",
|
|
" :Summary Statistics:\n",
|
|
"\n",
|
|
" ============== ==== ==== ======= ===== ====================\n",
|
|
" Min Max Mean SD Class Correlation\n",
|
|
" ============== ==== ==== ======= ===== ====================\n",
|
|
" sepal length: 4.3 7.9 5.84 0.83 0.7826\n",
|
|
" sepal width: 2.0 4.4 3.05 0.43 -0.4194\n",
|
|
" petal length: 1.0 6.9 3.76 1.76 0.9490 (high!)\n",
|
|
" petal width: 0.1 2.5 1.20 0.76 0.9565 (high!)\n",
|
|
" ============== ==== ==== ======= ===== ====================\n",
|
|
"\n",
|
|
" :Missing Attribute Values: None\n",
|
|
" :Class Distribution: 33.3% for each of 3 classes.\n",
|
|
" :Creator: R.A. Fisher\n",
|
|
" :Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)\n",
|
|
" :Date: July, 1988\n",
|
|
"\n",
|
|
"The famous Iris database, first used by Sir R.A. Fisher. The dataset is taken\n",
|
|
"from Fisher's paper. Note that it's the same as in R, but not as in the UCI\n",
|
|
"Machine Learning Repository, which has two wrong data points.\n",
|
|
"\n",
|
|
"This is perhaps the best known database to be found in the\n",
|
|
"pattern recognition literature. Fisher's paper is a classic in the field and\n",
|
|
"is referenced frequently to this day. (See Duda & Hart, for example.) The\n",
|
|
"data set contains 3 classes of 50 instances each, where each class refers to a\n",
|
|
"type of iris plant. One class is linearly separable from the other 2; the\n",
|
|
"latter are NOT linearly separable from each other.\n",
|
|
"\n",
|
|
".. topic:: References\n",
|
|
"\n",
|
|
" - Fisher, R.A. \"The use of multiple measurements in taxonomic problems\"\n",
|
|
" Annual Eugenics, 7, Part II, 179-188 (1936); also in \"Contributions to\n",
|
|
" Mathematical Statistics\" (John Wiley, NY, 1950).\n",
|
|
" - Duda, R.O., & Hart, P.E. (1973) Pattern Classification and Scene Analysis.\n",
|
|
" (Q327.D83) John Wiley & Sons. ISBN 0-471-22361-1. See page 218.\n",
|
|
" - Dasarathy, B.V. (1980) \"Nosing Around the Neighborhood: A New System\n",
|
|
" Structure and Classification Rule for Recognition in Partially Exposed\n",
|
|
" Environments\". IEEE Transactions on Pattern Analysis and Machine\n",
|
|
" Intelligence, Vol. PAMI-2, No. 1, 67-71.\n",
|
|
" - Gates, G.W. (1972) \"The Reduced Nearest Neighbor Rule\". IEEE Transactions\n",
|
|
" on Information Theory, May 1972, 431-433.\n",
|
|
" - See also: 1988 MLC Proceedings, 54-64. Cheeseman et al\"s AUTOCLASS II\n",
|
|
" conceptual clustering system finds 3 classes in the data.\n",
|
|
" - Many, many more ...\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"print(iris.DESCR)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 4,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"X = iris[\"data\"][:, (2,3)] # petal length and width\n",
|
|
"y = (iris[\"target\"]) # 1 if Iris virginica, else 0"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 5,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"(150, 2)\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"# Important variables\n",
|
|
"\n",
|
|
"X_with_bias = np.c_[np.ones([len(X), 1]), X] # Add column of ones for theta intercept term\n",
|
|
"alpha = 0.1\n",
|
|
"iterations=1500\n",
|
|
"\n",
|
|
"print(X.shape)\n",
|
|
"\n",
|
|
"# NOTE: If ValueError: all input arrays must have the same shape appears then you may have run this cel multiple times\n",
|
|
"# which will have added multiple collumns of ones to the matrix X"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 70,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Setup our proportions\n",
|
|
"\n",
|
|
"test_ratio = .2\n",
|
|
"val_ratio = .2\n",
|
|
"total_size = len(X)\n",
|
|
"\n",
|
|
"# Calculate size of our splits\n",
|
|
"\n",
|
|
"test_size = int(test_ratio*total_size)\n",
|
|
"val_size = int(val_ratio*total_size)\n",
|
|
"train_size = total_size - test_size - val_size\n",
|
|
"\n",
|
|
"# Split our data\n",
|
|
"\n",
|
|
"rnd_indices = np.random.permutation(total_size) # Shuffle our input matrix\n",
|
|
"\n",
|
|
"X_train = X_with_bias[rnd_indices[:train_size]]\n",
|
|
"y_train = y[rnd_indices[:train_size]]\n",
|
|
"X_valid = X_with_bias[rnd_indices[train_size:-test_size]]\n",
|
|
"y_valid = y[rnd_indices[train_size:-test_size]]\n",
|
|
"X_test = X_with_bias[rnd_indices[-test_size:]]\n",
|
|
"y_test = y[rnd_indices[-test_size:]]"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 71,
|
|
"metadata": {
|
|
"scrolled": true
|
|
},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"(90, 3)\n",
|
|
"(30, 2)\n",
|
|
"(30, 3)\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"print(X_train.shape)\n",
|
|
"print(X_val.shape)\n",
|
|
"print(X_test.shape)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 72,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"def to_one_hot(y):\n",
|
|
" n_classes = y.max() + 1\n",
|
|
" m = len(y)\n",
|
|
" Y_one_hot = np.zeros((m, n_classes)) # Setup zero matrix with m rows and a column for each class\n",
|
|
" Y_one_hot[np.arange(m), y] = 1 # Fill in ones\n",
|
|
" return Y_one_hot"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 73,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"array([2, 2, 2, 0, 0, 0, 1, 2, 0, 2])"
|
|
]
|
|
},
|
|
"execution_count": 73,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"y_train[:10]"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 74,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"array([[0., 0., 1.],\n",
|
|
" [0., 0., 1.],\n",
|
|
" [0., 0., 1.],\n",
|
|
" [1., 0., 0.],\n",
|
|
" [1., 0., 0.],\n",
|
|
" [1., 0., 0.],\n",
|
|
" [0., 1., 0.],\n",
|
|
" [0., 0., 1.],\n",
|
|
" [1., 0., 0.],\n",
|
|
" [0., 0., 1.]])"
|
|
]
|
|
},
|
|
"execution_count": 74,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"to_one_hot(y_train[:10])"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 75,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"Y_train_one_hot = to_one_hot(y_train)\n",
|
|
"Y_test_one_hot = to_one_hot(y_test)\n",
|
|
"Y_val_one_hot = to_one_hot(y_val)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 76,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Softmax function = exp(X) / (sum of exp(X))\n",
|
|
"\n",
|
|
"def softmax(logits):\n",
|
|
" exps = np.exp(logits)\n",
|
|
" exp_sums = np.sum(exps, axis=1, keepdims=True)\n",
|
|
" return exps / exp_sums"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 82,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"n_inputs = X_train.shape[1] # Number of features\n",
|
|
"n_outputs = len(np.unique(y_train)) # 3 uniqure values which will each be a possible output"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 80,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"0 1.4567897105648775\n",
|
|
"500 0.7451993577978241\n",
|
|
"1000 0.6279369677273878\n",
|
|
"1500 0.5572702696067121\n",
|
|
"2000 0.5111859948576022\n",
|
|
"2500 0.47856473219026296\n",
|
|
"3000 0.45387932862540925\n",
|
|
"3500 0.43422780377165426\n",
|
|
"4000 0.41797875623202274\n",
|
|
"4500 0.4041537521442775\n",
|
|
"5000 0.39213163561158126\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"eta = 0.01\n",
|
|
"n_iterations = 5001\n",
|
|
"m = len(X_train)\n",
|
|
"epsilon = 1e-7\n",
|
|
"\n",
|
|
"Theta = np.random.randn(n_inputs, n_outputs)\n",
|
|
"\n",
|
|
"# Cycle through set to apply batch gradient descent\n",
|
|
"\n",
|
|
"for iteration in range(n_iterations):\n",
|
|
" logits = X_train.dot(Theta) # Logits which are raw predictions from applying X to Theta\n",
|
|
" p_hat = softmax(logits) # Apply softmax to logits to get our probabilities\n",
|
|
" loss = -np.mean(np.sum(Y_train_one_hot * np.log(p_hat + epsilon), axis=1)) # Compute loss function\n",
|
|
" error = p_hat - Y_train_one_hot # Compute error \n",
|
|
" if iteration % 500 == 0:\n",
|
|
" print(iteration, loss)\n",
|
|
" Grad = 1/m * X_train.T.dot(error)\n",
|
|
" Theta = Theta - eta * Grad\n",
|
|
" "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 81,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"array([[ 3.61613128, 0.06856255, -2.86225561],\n",
|
|
" [-0.2597962 , 0.80558911, 0.70553675],\n",
|
|
" [-0.90831271, 0.18903751, 2.43558706]])"
|
|
]
|
|
},
|
|
"execution_count": 81,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"Theta"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 87,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"0.9666666666666667"
|
|
]
|
|
},
|
|
"execution_count": 87,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"# Predictions\n",
|
|
"\n",
|
|
"logits = X_valid.dot(Theta)\n",
|
|
"p_hat = softmax(logits)\n",
|
|
"y_pred = np.argmax(p_hat, axis=1)\n",
|
|
"\n",
|
|
"accuracy_score = np.mean(y_pred == y_valid)\n",
|
|
"accuracy_score"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": []
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python 3",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.7.4"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 2
|
|
}
|