CS231N作业assignment1之SVM部分

Assignment from: http: // cs231n.github.io / assignments2018 / assignment1/

目标：

a fully - vectorized loss function for the SVM
fully - vectorized expression for its analytic gradient
use a validation set to tune the learning rate and regularization strength
optimize the loss function with SGD
visualize the final learned weights

Set up部分

# Run some setup code for this notebook.
from __future__ import print_function

import random
import numpy as np
from cs231n.data_utils import load_CIFAR10
import matplotlib.pyplot as plt


# This is a bit of magic to make matplotlib figures appear inline in the
# notebook rather than in a new window.
%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0)  # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

# Some more magic so that the notebook will reload external python modules;
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2

读取CIFAR-10的数据，预处理

# Load the raw CIFAR-10 data.
cifar10_dir = 'cs231n/datasets/cifar-10-batches-py'

# Cleaning up variables to prevent loading data multiple times (which may cause memory issue)
try:
  del X_train, y_train
  del X_test, y_test
  print('Clear previously loaded data.')
except:
  pass

X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)

# As a sanity check, we print out the size of the training and test data.
print('Training data shape: ', X_train.shape)
print('Training labels shape: ', y_train.shape)
print('Test data shape: ', X_test.shape)
print('Test labels shape: ', y_test.shape)

结果：

Training data shape: (50000, 32, 32, 3)
Training labels shape: (50000,)
Test data shape: (10000, 32, 32, 3)
Test labels shape: (10000,)

可视化dataset

从类型中

# Visualize some examples from the dataset.
# We show a few examples of training images from each class.
classes = ['plane', 'car', 'bird', 'cat', 'deer',
           'dog', 'frog', 'horse', 'ship', 'truck']
num_classes = len(classes)
samples_per_class = 7
for y, cls in enumerate(classes):
  idxs = np.flatnonzero(y_train == y)
  idxs = np.random.choice(idxs, samples_per_class, replace=False)
  for i, idx in enumerate(idxs):
    plt_idx = i * num_classes + y + 1
    plt.subplot(samples_per_class, num_classes, plt_idx)
    plt.imshow(X_train[idx].astype('uint8'))
    plt.axis('off')
    if i == 0:
      plt.title(cls)
plt.show()

1	np.flatnonzero(y_train == y)

返回内容非0的index。这句是返回plane类别里面的（y_train == y）所有非0的内容。然后从这些里面随机选择7个内容，画出来。

结果如下：
可视化

进一步分为几部分

# Split the data into train, val, and test sets. In addition we will
# create a small development set as a subset of the training data;
# we can use this for development so our code runs faster.
num_training = 49000
num_validation = 1000
num_test = 1000
# 用这部分来优化代码
num_dev = 500

# Our validation set will be num_validation points from the original
# training set.
mask = range(num_training, num_training + num_validation)
X_val = X_train[mask]
y_val = y_train[mask]

# Our training set will be the first num_train points from the original
# training set.
mask = range(num_training)
X_train = X_train[mask]
y_train = y_train[mask]

# We will also make a development set, which is a small subset of
# the training set.
mask = np.random.choice(num_training, num_dev, replace=False)
X_dev = X_train[mask]
y_dev = y_train[mask]

# We use the first num_test points of the original test set as our
# test set.
mask = range(num_test)
X_test = X_test[mask]
y_test = y_test[mask]

print('Train data shape: ', X_train.shape)
print('Train labels shape: ', y_train.shape)
print('Validation data shape: ', X_val.shape)
print('Validation labels shape: ', y_val.shape)
print('Test data shape: ', X_test.shape)
print('Test labels shape: ', y_test.shape)

1 2	mask = range(num_test) X_test = X_test[mask]

感觉这是一种从一个整体中选取其中一部分的代码

将image拉成row

# Preprocessing: reshape the image data into rows
X_train = np.reshape(X_train, (X_train.shape[0], -1))
X_val = np.reshape(X_val, (X_val.shape[0], -1))
X_test = np.reshape(X_test, (X_test.shape[0], -1))
X_dev = np.reshape(X_dev, (X_dev.shape[0], -1))

# As a sanity check, print out the shapes of the data
print('Training data shape: ', X_train.shape)
print('Validation data shape: ', X_val.shape)
print('Test data shape: ', X_test.shape)
print('dev data shape: ', X_dev.shape)

当想把无论任何大小的东西拉成一整行的时候，用a.reshape(x, -1)。 X_train.shape[0]行，列数未知，但是拉平了
如果想拉成一整列的时候，用a.reshape(-1, x)。 列数为x，每列有多少东西未知

预处理部分：减去mean image

第一步，求出训练集的mean并且可视化

# Preprocessing: subtract the mean image
# first: compute the image mean based on the training data
mean_image = np.mean(X_train, axis=0)
print(mean_image[:10])  # print a few of the elements
plt.figure(figsize=(4, 4))
plt.imshow(mean_image.reshape((32, 32, 3)).astype(
    'uint8'))  # visualize the mean image
plt.show()

第二步，从train和test里面减去平均数据

# second: subtract the mean image from train and test data
X_train -= mean_image
X_val -= mean_image
X_test -= mean_image
X_dev -= mean_image

第三步，把预处理好的所有图片的末尾（拉成行之后的最后）加了一个1（bias的dim）

# third: append the bias dimension of ones (i.e. bias trick) so that our SVM
# only has to worry about optimizing a single weight matrix W.
X_train = np.hstack([X_train, np.ones((X_train.shape[0], 1))])
X_val = np.hstack([X_val, np.ones((X_val.shape[0], 1))])
X_test = np.hstack([X_test, np.ones((X_test.shape[0], 1))])
X_dev = np.hstack([X_dev, np.ones((X_dev.shape[0], 1))])

print(X_train.shape, X_val.shape, X_test.shape, X_dev.shape)

np.hstack()，沿着水平方向把数组叠起来。
于此相同，np.vstack()，是沿着垂直方向把数组叠起来。

SVM classifier

1	cs231n / classifiers / linear_svm.py.

svm_loss_naive

有三个输入
- X：一个有N个元素的minibatch，每个元素的内容是D(N, D)
- W: weights，(D, C), 图片的内容是D，一共C个class，所以用的时候跟普遍想法的W是tranpose的
- y: 标签，大小(N,) 一共N张照片，每张照片有一个标签
最终结果
- 一个float的结果：loss
- W的gradient dW
注意，Wx求出来的就是不同分类的积分

dW的计算(https://blog.csdn.net/zt_1995/article/details/62227201)

形状很奇怪的1(x)指的是，当x为真的时候结果是1，当x为假的时候结果取0
第一个式子表示第i个被正确分类的梯度
- 有多少个Wj让这个边界值不被满足，就对损失起了多少贡献
- 乘以xi是因为xi包含了样本的全部特征，所以前面乘以一个系数1就可以了
- 符号是因为SGD采用负梯度运算
第二个式子表示不正确分类的梯度，只有在yi == j的时候才有贡献，所以没有求和。但是注意，在每张图里面，这个都会在j == yi的时候发生一次，所以每张图的j部分需要加上这个值
最终的结果需要，除以N
别忘了正则化！而且用2\lanmdaW来正则化的效果更好一些



def svm_loss_naive(W, X, y, reg):
  """
  Structured SVM loss function, naive implementation (with loops).

  Inputs have dimension D, there are C classes, and we operate on minibatches
  of N examples.

  Inputs:
  - W: A numpy array of shape (D, C) containing weights.
  - X: A numpy array of shape (N, D) containing a minibatch of data.
  - y: A numpy array of shape (N,) containing training labels; y[i] = c means
    that X[i] has label c, where 0 <= c < C.
  - reg: (float) regularization strength

  Returns a tuple of:
  - loss as single float
  - gradient with respect to weights W; an array of same shape as W
  """
  dW = np.zeros(W.shape)  # initialize the gradient as zero

  # compute the loss and the gradient
  num_classes = W.shape[1]
  # 有多少需要训练的个数
  num_train = X.shape[0]
  loss = 0.0
  for i in range(num_train):
    scores = X[i].dot(W)
    correct_class_score = scores[y[i]]
    for j in range(num_classes):
      # 如果这个类型是正确的，那就不用管了
      if j == y[i]:
        continue
      margin = scores[j] - correct_class_score + 1  # note delta = 1
      if margin > 0:
        loss += margin
        dW[:, y[i]] += -X[i]
        dW[:, j] += X[i]

  # Right now the loss is a sum over all training examples, but we want it
  # to be an average instead so we divide by num_train.
  loss /= num_train
  dW /= num_train

  # Add regularization to the loss.
  # reg是lanbda
  loss += reg * np.sum(W * W)
  dW += 2 * reg * W

  return loss, dW

svm_loss_vectorized

通过向量化来提高计算速度

计算loss部分

W是一个(D, C)的向量，X是(N, D)的，所以两者相乘可以得到一个(N, C)的矩阵，N为图片数量，C是每张图片对于不同分类的score
在score中取每一行的y中label部分就是这张图正确类型的评分
把整体的score矩阵的所有项减去正确评分的矩阵（应该可以广播但是我刚开始用repeat和reshape复制了一下），减去的结果就是svm中需要和0比的值（margin）
为了求loss，把小于0的项目和正确的项除去（都设置成0）
然后行求和，列求和，除以整体的个数，regularzation

计算dW部分

X.T点乘margin得到的就是最终的loss，所以需要把每个margin里面符合条件的数对了
所有比0大的时候都算1（根据导数的计算结果）
当应该判断正确的类型比0大的时候，这个东西会在每次计算导数的时候都算上一次，所以是行的合

最后乘完之后除以总的个数，再regularzation



def svm_loss_vectorized(W, X, y, reg):
  """
  Structured SVM loss function, vectorized implementation.

  Inputs and outputs are the same as svm_loss_naive.
  """
  loss = 0.0
  dW = np.zeros(W.shape)  # initialize the gradient as zero

  #############################################################################
  # TODO:                                                                     #
  # Implement a vectorized version of the structured SVM loss, storing the    #
  # result in loss.                                                           #
  #############################################################################
  num_train = X.shape[0]
  num_classes = W.shape[1]
  scores = X.dot(W)
  # 这里是取第N行（图片行）的第C个（class列），得到的是（500，）的正确类的score的矩阵
  correct_class_score = scores[np.arange(num_train), y]
  # correct_class_score = np.repeat(correct_class_score, num_classes)
  # correct_class_score = correct_class_score.reshape(num_train, num_classes)

  # DxC
  margin = scores - correct_class_score + 1.0
  margin[np.arange(num_train), y] = 0.0
  margin[margin <= 0] = 0.0

  loss += np.sum(np.sum(margin, axis=1)) / num_train

  # loss /= num_train
  loss += 0.5 * reg * np.sum(W * W)


  margin[margin > 0] = 1.0
  calculate_times = np.sum(margin, axis=1)
  margin[np.arange(num_train), y] = - calculate_times

  dW = np.dot(X.T, margin) / num_train
  dW += 2 * reg * W

  #############################################################################
  #                             END OF YOUR CODE                              #
  #############################################################################

  return loss, dW

现在得到了dW和loss，使用SGD来减少loss

训练

将整体分成不同的minibatch，使用np.random.choice，注意后面的replce可以选True，这样会重复选择元素但是结果速度好像是更快了
将minibatch的结果计算loss和gradient，然后grad * learning rate来update数据

def train(self, X, y, learning_rate=1e-3, reg=1e-5, num_iters=100,
            batch_size=200, verbose=False):
    """
    Train this linear classifier using stochastic gradient descent.

    Inputs:
    - X: A numpy array of shape (N, D) containing training data; there are N
      training samples each of dimension D.
    - y: A numpy array of shape (N,) containing training labels; y[i] = c
      means that X[i] has label 0 <= c < C for C classes.
    - learning_rate: (float) learning rate for optimization.
    - reg: (float) regularization strength.
    - num_iters: (integer) number of steps to take when optimizing
    - batch_size: (integer) number of training examples to use at each step.
    - verbose: (boolean) If true, print progress during optimization.

    Outputs:
    A list containing the value of the loss function at each training iteration.
    """
    num_train, dim = X.shape
    # assume y takes values 0...K-1 where K is number of classes
    num_classes = np.max(y) + 1
    if self.W is None:
      # lazily initialize W
      self.W = 0.001 * np.random.randn(dim, num_classes)

    # Run stochastic gradient descent to optimize W
    loss_history = []
    for it in range(num_iters):
      X_batch = None
      y_batch = None

      #########################################################################
      # TODO:                                                                 #
      # Sample batch_size elements from the training data and their           #
      # corresponding labels to use in this round of gradient descent.        #
      # Store the data in X_batch and their corresponding labels in           #
      # y_batch; after sampling X_batch should have shape (dim, batch_size)   #
      # and y_batch should have shape (batch_size,)                           #
      #                                                                       #
      # Hint: Use np.random.choice to generate indices. Sampling with         #
      # replacement is faster than sampling without replacement.              #
      #########################################################################
      indices = np.random.choice(num_train, batch_size, replace=True)
      X_batch = X[indices]
      y_batch = y[indices]

      #########################################################################
      #                       END OF YOUR CODE                                #
      #########################################################################

      # evaluate loss and gradient
      loss, grad = self.loss(X_batch, y_batch, reg)
      loss_history.append(loss)

      # perform parameter update
      #########################################################################
      # TODO:                                                                 #
      # Update the weights using the gradient and the learning rate.          #
      #########################################################################
      self.W += - learning_rate * grad
      #########################################################################
      #                       END OF YOUR CODE                                #
      #########################################################################

      if verbose and it % 100 == 0:
        print('iteration %d / %d: loss %f' % (it, num_iters, loss))

    return loss_history

预测结果

已经有了前面的到的训练过的W（self.W）
Wx算出来的就是分数

从每一行里面选择最大的分数就是预测的结果

def predict(self, X):
    """
    Use the trained weights of this linear classifier to predict labels for
    data points.

    Inputs:
    - X: A numpy array of shape (N, D) containing training data; there are N
      training samples each of dimension D.

    Returns:
    - y_pred: Predicted labels for the data in X. y_pred is a 1-dimensional
      array of length N, and each element is an integer giving the predicted
      class.
    """
    y_pred = np.zeros(X.shape[0])
    ###########################################################################
    # TODO:                                                                   #
    # Implement this method. Store the predicted labels in y_pred.            #
    ###########################################################################
    scores = X.dot(self.W)
    y_pred = np.argmax(scores, axis=1)
    # print(labels.shape)
    # print(labels)
    ###########################################################################
    #                           END OF YOUR CODE                              #
    ###########################################################################
    return y_pred

交叉验证

在作业里，需要选择两个hyper的值，分别是学习率和regularzation的参数，没有采用交叉验证，但是采用了随机搜索，会比grid search更准确一些
采用不同的参数组合分别训练这个模型，然后得到各自在validation上面的准确率，这个得到准确率最大的组合的参数
注意，在验证的过程中应该选择iter的次数少一点，不然训练的时间会非常长
在这个代码里用了rand来得到0到1之间的随机数，这个数乘以hyper的范围的差，然后再加上下限，就是随机得到的最终结果

rand_turple = np.random.rand(50,2)
rand_turple[:,0] = rand_turple[:,0]*(learning_rates[1]-learning_rates[0]) + learning_rates[0]
rand_turple[:,1] = rand_turple[:,1]*(regularization_strengths[1]-regularization_strengths[0])+regularization_strengths[0]
for lr,rs in rand_turple:
    svm = LinearSVM()
    svm.train(X_train, y_train, learning_rate=lr, reg=rs,num_iters=1500, verbose=True)
    y_train_pred = svm.predict(X_train)
    train_acc = np.mean(y_train == y_train_pred)
    y_val_pred = svm.predict(X_train)
    val_acc = np.mean(y_train == y_val_pred)
    results[(lr,rs)] = (train_acc,val_acc)
    if (val_acc > best_val):
        best_val = val_acc
        best_svm = svm

结果可视化

# Visualize the learned weights for each class.
# Depending on your choice of learning rate and regularization strength, these may
# or may not be nice to look at.
w = best_svm.W[:-1,:] # strip out the bias
w = w.reshape(32, 32, 3, 10)
w_min, w_max = np.min(w), np.max(w)
classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
for i in range(10):
    plt.subplot(2, 5, i + 1)
      
    # Rescale the weights to be between 0 and 255
    wimg = 255.0 * (w[:, :, :, i].squeeze() - w_min) / (w_max - w_min)
    plt.imshow(wimg.astype('uint8'))
    plt.axis('off')
    plt.title(classes[i])

可视化结果