CS231N作业assignment1之softmax

Softmax

这部分主要是softmax的loss要如何计算
Assignment From: Assignment1

目标

  • implement a fully-vectorized loss function for the Softmax classifier
  • implement the fully-vectorized expression for its analytic gradient
  • check your implementation with numerical gradient
  • use a validation set to tune the learning rate and regularization strength
  • optimize the loss function with SGD
  • visualize the final learned weights

预处理(和之前一样(

  • 载入数据
  • 初始化数据
    • 拉长
    • normalize
    • 分成训练集测试集validation等等

softmax classifier

naive_softmax_loss

中心思想:把得到的score(Wx + b)先exp,然后normalize,最后求-log

输入:

  • W:大小(D,C),weights
  • X:大小(N,D),输入的mini-batch
  • y:大小(N,),标签
  • reg:regularization的系数

输出:

  • loss
  • dW,即改变的gradient

计算loss

  • 先将所有的scores做exp(这一步可以先进行),这样所有的score都会变成正数
  • 然后对不同class的score分别求normalize(虽然说是normalize,实际求的是这个种类的score在所有的score里面所占的比例)
  • 然后将正确的类型所占的比例求log,再求负号,得出来的就是每个图片的loss(这里注意0的log是无穷,计算不出来)
  • 所有图片的loss求和,然后除以图片总数,regularzation,得出来的就是最终的结果

计算dW

  • 可以这样理解
    • W是一个参数矩阵,这个矩阵的变化由两个部分组成
      • 第一部分是往什么方向变,这个取决于最后算出来的loss的分布
      • 第二部分是变多少合适,这时候还需要乘一个系数X[i]
    • 所以当算出来loss并且y[i] = j的时候,实际上就是这张图正确分类情况下的错误分类的概率,所以W的改变方向应该是这个的反方向
    • 这张图的其他class的loss则应该是改变的方向
  • 这样就可以看出来 SVM和softmax的不同之处了
    • 对于SVM来说,仅仅通过与0比大小得出一个值,相当于一个0,1的开关,只能根据结果得到一个移动的方向
    • 但是对于softmax来说,不仅得到了方向,还得到了这个方向的占比,所以loss越大的数影响就会越大
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
def svm_loss_naive(W, X, y, reg):
"""
Structured SVM loss function, naive implementation (with loops).

Inputs have dimension D, there are C classes, and we operate on minibatches
of N examples.

Inputs:
- W: A numpy array of shape (D, C) containing weights.
- X: A numpy array of shape (N, D) containing a minibatch of data.
- y: A numpy array of shape (N,) containing training labels; y[i] = c means
that X[i] has label c, where 0 <= c < C.
- reg: (float) regularization strength

Returns a tuple of:
- loss as single float
- gradient with respect to weights W; an array of same shape as W
"""
dW = np.zeros(W.shape) # initialize the gradient as zero

# compute the loss and the gradient
num_classes = W.shape[1]
# 有多少需要训练的个数
num_train = X.shape[0]
loss = 0.0
for i in range(num_train):
scores = X[i].dot(W)
correct_class_score = scores[y[i]]
for j in range(num_classes):
# 如果这个类型是正确的,那就不用管了
if j == y[i]:
continue
margin = scores[j] - correct_class_score + 1 # note delta = 1
if margin > 0:
loss += margin
dW[:, y[i]] += -X[i]
dW[:, j] += X[i]

# Right now the loss is a sum over all training examples, but we want it
# to be an average instead so we divide by num_train.
loss /= num_train
dW /= num_train

# Add regularization to the loss.
# reg是lanbda
loss += reg * np.sum(W * W)
dW += 2 * reg * W
#############################################################################
# TODO: #
# Compute the gradient of the loss function and store it dW. #
# Rather that first computing the loss and then computing the derivative, #
# it may be simpler to compute the derivative at the same time that the #
# loss is being computed. As a result you may need to modify some of the #
# code above to compute the gradient. #
#############################################################################

return loss, dW

softmax_loss_vectorized提高计算速度

  • 跟svm部分的计算思路一样,直接使用矩阵运算
  • 在求整个score矩阵的变化的时候,正确分类的loss应该被减掉,但是现在是被加上的,所以需要在正确分类的地方加一个-1
  • debug了很久的地方是:计算dW的时候不需要计算log,因为没有log之前已经是这个loss所占的百分比了:求log是为了变成凸函数,loss没有求log之前并不是凸函数,但是凸函数容易找到最值的优化问问题,所以要求log。但是在计算dW的时候和log没关系
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
def softmax_loss_vectorized(W, X, y, reg):
"""
Softmax loss function, vectorized version.

Inputs and outputs are the same as softmax_loss_naive.
"""
# Initialize the loss and gradient to zero.
loss = 0.0
dW = np.zeros_like(W)
num_class = W.shape[1]
num_train = X.shape[0]

#############################################################################
# TODO: Compute the softmax loss and its gradient using no explicit loops. #
# Store the loss in loss and the gradient in dW. If you are not careful #
# here, it is easy to run into numeric instability. Don't forget the #
# regularization! #
#############################################################################
# size(N,C)
scores = X.dot(W)
scores = np.exp(scores)
# 对每行求和
scores_sum = np.sum(scores, axis=1)
scores_sum = np.repeat(scores_sum, num_class)
scores_sum = scores_sum.reshape(num_train, num_class)
# true_divide返回浮点数,普通的返回正数,size(N,C)
percent = np.true_divide(scores, scores_sum)

# 只有正确种类需要求loss
Li = -np.log(percent[np.arange(num_train), y])
loss = np.sum(Li)

# 注意这里不需要求log
dS = percent.copy()
dS[np.arange(num_train), y] += -1
dW = (X.T).dot(dS)

loss /= num_train
loss += reg * np.sum(W * W)
dW /= num_train
dW += reg * W

#############################################################################
# END OF YOUR CODE #
#############################################################################

return loss, dW

验证,选hyper

和SVM的部分一样,随机搜索hyper,验证结果,训练迭代500次,最终的准确率在36%左右

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
# Use the validation set to tune hyperparameters (regularization strength and
# learning rate). You should experiment with different ranges for the learning
# rates and regularization strengths; if you are careful you should be able to
# get a classification accuracy of over 0.35 on the validation set.
from cs231n.classifiers import Softmax
results = {}
best_val = -1
best_softmax = None
learning_rates = [1e-7, 5e-7]
regularization_strengths = [2.5e4, 5e4]

################################################################################
# TODO: #
# Use the validation set to set the learning rate and regularization strength. #
# This should be identical to the validation that you did for the SVM; save #
# the best trained softmax classifer in best_softmax. #
################################################################################
hyper_values = np.random.rand(50,2)
hyper_values[:,0] = (learning_rates[1] - learning_rates[0]) * hyper_values[:,0] + learning_rates[0]
hyper_values[:,1] = (regularization_strengths[1] - regularization_strengths[0]) * hyper_values[:,1] + regularization_strengths[0]

for lr, rs in hyper_values:
softmax = Softmax()
softmax.train(X_train,y_train,lr,rs,num_iters = 500,verbose = True)
train_pred = softmax.predict(X_train)
train_acc = np.mean(y_train == train_pred)
val_pred = softmax.predict(X_val)
val_acc = np.mean(y_val == val_pred)

results[(lr,rs)] = (train_acc,val_acc)

if val_acc > best_val:
best_val = val_acc
best_softmax = softmax


################################################################################
# END OF YOUR CODE #
################################################################################

# Print out results.
for lr, reg in sorted(results):
train_accuracy, val_accuracy = results[(lr, reg)]
print('lr %e reg %e train accuracy: %f val accuracy: %f' % (
lr, reg, train_accuracy, val_accuracy))

print('best validation accuracy achieved during cross-validation: %f' % best_val)

可以看出来感觉softmax比SVM的效果好一些?

可视化最终的优化的weight

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# Visualize the learned weights for each class
w = best_softmax.W[:-1,:] # strip out the bias
w = w.reshape(32, 32, 3, 10)

w_min, w_max = np.min(w), np.max(w)

classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
for i in range(10):
plt.subplot(2, 5, i + 1)

# Rescale the weights to be between 0 and 255
wimg = 255.0 * (w[:, :, :, i].squeeze() - w_min) / (w_max - w_min)
plt.imshow(wimg.astype('uint8'))
plt.axis('off')
plt.title(classes[i])

可视化