Softmax

这部分主要是softmax的loss要如何计算
Assignment From: Assignment1

目标

implement a fully-vectorized loss function for the Softmax classifier
implement the fully-vectorized expression for its analytic gradient
check your implementation with numerical gradient
use a validation set to tune the learning rate and regularization strength
optimize the loss function with SGD
visualize the final learned weights

预处理（和之前一样（

载入数据
初始化数据
- 拉长
- normalize
- 分成训练集测试集validation等等

softmax classifier

naive_softmax_loss

中心思想：把得到的score（Wx + b）先exp，然后normalize，最后求-log

输入：

W：大小(D,C)，weights
X：大小(N,D)，输入的mini-batch
y：大小(N,)，标签
reg：regularization的系数

输出：

loss
dW，即改变的gradient

计算loss

先将所有的scores做exp（这一步可以先进行），这样所有的score都会变成正数
然后对不同class的score分别求normalize（虽然说是normalize，实际求的是这个种类的score在所有的score里面所占的比例）
然后将正确的类型所占的比例求log，再求负号，得出来的就是每个图片的loss（这里注意0的log是无穷，计算不出来）
所有图片的loss求和，然后除以图片总数，regularzation，得出来的就是最终的结果

计算dW

可以这样理解
- W是一个参数矩阵，这个矩阵的变化由两个部分组成
  - 第一部分是往什么方向变，这个取决于最后算出来的loss的分布
  - 第二部分是变多少合适，这时候还需要乘一个系数X[i]
- 所以当算出来loss并且y[i] = j的时候，实际上就是这张图正确分类情况下的错误分类的概率，所以W的改变方向应该是这个的反方向
- 这张图的其他class的loss则应该是改变的方向
这样就可以看出来 SVM和softmax的不同之处了
- 对于SVM来说，仅仅通过与0比大小得出一个值，相当于一个0，1的开关，只能根据结果得到一个移动的方向
- 但是对于softmax来说，不仅得到了方向，还得到了这个方向的占比，所以loss越大的数影响就会越大

def svm_loss_naive(W, X, y, reg):
  """
  Structured SVM loss function, naive implementation (with loops).

  Inputs have dimension D, there are C classes, and we operate on minibatches
  of N examples.

  Inputs:
  - W: A numpy array of shape (D, C) containing weights.
  - X: A numpy array of shape (N, D) containing a minibatch of data.
  - y: A numpy array of shape (N,) containing training labels; y[i] = c means
    that X[i] has label c, where 0 <= c < C.
  - reg: (float) regularization strength

  Returns a tuple of:
  - loss as single float
  - gradient with respect to weights W; an array of same shape as W
  """
  dW = np.zeros(W.shape)  # initialize the gradient as zero

  # compute the loss and the gradient
  num_classes = W.shape[1]
  # 有多少需要训练的个数
  num_train = X.shape[0]
  loss = 0.0
  for i in range(num_train):
    scores = X[i].dot(W)
    correct_class_score = scores[y[i]]
    for j in range(num_classes):
      # 如果这个类型是正确的，那就不用管了
      if j == y[i]:
        continue
      margin = scores[j] - correct_class_score + 1  # note delta = 1
      if margin > 0:
        loss += margin
        dW[:, y[i]] += -X[i]
        dW[:, j] += X[i]

  # Right now the loss is a sum over all training examples, but we want it
  # to be an average instead so we divide by num_train.
  loss /= num_train
  dW /= num_train

  # Add regularization to the loss.
  # reg是lanbda
  loss += reg * np.sum(W * W)
  dW += 2 * reg * W
  #############################################################################
  # TODO:                                                                     #
  # Compute the gradient of the loss function and store it dW.                #
  # Rather that first computing the loss and then computing the derivative,   #
  # it may be simpler to compute the derivative at the same time that the     #
  # loss is being computed. As a result you may need to modify some of the    #
  # code above to compute the gradient.                                       #
  #############################################################################

  return loss, dW

softmax_loss_vectorized提高计算速度

跟svm部分的计算思路一样，直接使用矩阵运算
在求整个score矩阵的变化的时候，正确分类的loss应该被减掉，但是现在是被加上的，所以需要在正确分类的地方加一个-1
debug了很久的地方是：计算dW的时候不需要计算log，因为没有log之前已经是这个loss所占的百分比了：求log是为了变成凸函数，loss没有求log之前并不是凸函数，但是凸函数容易找到最值的优化问问题，所以要求log。但是在计算dW的时候和log没关系

def softmax_loss_vectorized(W, X, y, reg):
  """
  Softmax loss function, vectorized version.

  Inputs and outputs are the same as softmax_loss_naive.
  """
  # Initialize the loss and gradient to zero.
  loss = 0.0
  dW = np.zeros_like(W)
  num_class = W.shape[1]
  num_train = X.shape[0]

  #############################################################################
  # TODO: Compute the softmax loss and its gradient using no explicit loops.  #
  # Store the loss in loss and the gradient in dW. If you are not careful     #
  # here, it is easy to run into numeric instability. Don't forget the        #
  # regularization!                                                           #
  #############################################################################
  # size（N，C）
  scores = X.dot(W)
  scores = np.exp(scores)
  # 对每行求和
  scores_sum = np.sum(scores, axis=1)
  scores_sum = np.repeat(scores_sum, num_class)
  scores_sum = scores_sum.reshape(num_train, num_class)
  # true_divide返回浮点数，普通的返回正数，size（N，C）
  percent = np.true_divide(scores, scores_sum)

  # 只有正确种类需要求loss
  Li = -np.log(percent[np.arange(num_train), y])
  loss = np.sum(Li)

  # 注意这里不需要求log
  dS = percent.copy()
  dS[np.arange(num_train), y] += -1
  dW = (X.T).dot(dS)

  loss /= num_train
  loss += reg * np.sum(W * W)
  dW /= num_train
  dW += reg * W

  #############################################################################
  #                          END OF YOUR CODE                                 #
  #############################################################################

  return loss, dW

验证，选hyper

和SVM的部分一样，随机搜索hyper，验证结果，训练迭代500次，最终的准确率在36%左右

# Use the validation set to tune hyperparameters (regularization strength and
# learning rate). You should experiment with different ranges for the learning
# rates and regularization strengths; if you are careful you should be able to
# get a classification accuracy of over 0.35 on the validation set.
from cs231n.classifiers import Softmax
results = {}
best_val = -1
best_softmax = None
learning_rates = [1e-7, 5e-7]
regularization_strengths = [2.5e4, 5e4]

################################################################################
# TODO:                                                                        #
# Use the validation set to set the learning rate and regularization strength. #
# This should be identical to the validation that you did for the SVM; save    #
# the best trained softmax classifer in best_softmax.                          #
################################################################################
hyper_values = np.random.rand(50,2)
hyper_values[:,0] = (learning_rates[1] - learning_rates[0]) * hyper_values[:,0] + learning_rates[0]
hyper_values[:,1] = (regularization_strengths[1] - regularization_strengths[0]) * hyper_values[:,1] + regularization_strengths[0]

for lr, rs in hyper_values:
    softmax = Softmax()
    softmax.train(X_train,y_train,lr,rs,num_iters = 500,verbose = True)
    train_pred = softmax.predict(X_train)
    train_acc = np.mean(y_train == train_pred)
    val_pred = softmax.predict(X_val)
    val_acc = np.mean(y_val == val_pred)
    
    results[(lr,rs)] = (train_acc,val_acc)
    
    if val_acc > best_val:
        best_val = val_acc
        best_softmax = softmax


################################################################################
#                              END OF YOUR CODE                                #
################################################################################
    
# Print out results.
for lr, reg in sorted(results):
    train_accuracy, val_accuracy = results[(lr, reg)]
    print('lr %e reg %e train accuracy: %f val accuracy: %f' % (
                lr, reg, train_accuracy, val_accuracy))
    
print('best validation accuracy achieved during cross-validation: %f' % best_val)

可以看出来感觉softmax比SVM的效果好一些？

可视化最终的优化的weight

# Visualize the learned weights for each class
w = best_softmax.W[:-1,:] # strip out the bias
w = w.reshape(32, 32, 3, 10)

w_min, w_max = np.min(w), np.max(w)

classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
for i in range(10):
    plt.subplot(2, 5, i + 1)
    
    # Rescale the weights to be between 0 and 255
    wimg = 255.0 * (w[:, :, :, i].squeeze() - w_min) / (w_max - w_min)
    plt.imshow(wimg.astype('uint8'))
    plt.axis('off')
    plt.title(classes[i])

可视化