CS231nassignment3Vis

Network Visualization (PyTorch)

  • 在这部分用了一个已经在ImageNet上面pretrain过的CNN
  • 用这个CNN来定义一个loss function,然后用这个loss来测量现在的不高兴程度
  • back的时候计算这个loss对于每个像素的gradient
  • 保持这个model不变,但是在图片上面展示出来gradients的下降,形成让loss最小的图片

这个作业一共分成三个部分:

  • saliency map:一个比较快的方法来展示这个图片哪个部分影响了net分类的决定
  • fooling image:扰乱一个图片,让他看起来跟人似的,但是会被误分类
  • class visualization:形成可以得到最大分类得分的图片

注意这里需要先激活conda,不然在jupter里面torch会报错

事先处理

  • 事先定义了函数preprocess的部分,因为pretrain的时候也是提前进行好了预处理
  • 需要下载下来预处理的模型,这里用的是SqueezeNet,因为这样可以直接在CPU上面形成图片
  • 读取一部分ImageNet里面的图片看一看是什么样子的

saliency maps

  • saliency告诉我们每个pixel对分类得分的影响
  • 为了计算这个东西,我们需要计算没有正则化之前的score对于正确分类的gradient(具体到每个pixel)
    • 比如图片的大小是3xHxW,那么得到的gradient的形状也应该是3xHxW
    • 表示的就是这个pixel改变的话对于整个结果改变的影响
    • 为了计算,我们取每个gradient的绝对值,然后取三个channel里面的最大值,最后得到的大小是HxW

gather method

  • 就像在assignment1里面选择一个矩阵里面的最大值一样,gather这个方法就是在s.gather(1, y.view(-1, 1)).squeeze()一个N,C的矩阵s里面选择对应的y那个的值然后形成一个行的数组

compute_saliency_map

  • 输入:
    • X:输入的图片 (N,3,H,W)
    • y:label (N,)
    • model:预训练好的模型
  • 输出:
    • saliency,大小是(N,H,W)
  • 注意,因为torch这个对象自己本来就已经带着grad了,所以直接求出来就可以了,但是注意需要定义一下backward之后的大小应该是多少
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
def compute_saliency_maps(X, y, model):
"""
Compute a class saliency map using the model for images X and labels y.

Input:
- X: Input images; Tensor of shape (N, 3, H, W)
- y: Labels for X; LongTensor of shape (N,)
- model: A pretrained CNN that will be used to compute the saliency map.

Returns:
- saliency: A Tensor of shape (N, H, W) giving the saliency maps for the input
images.
"""
# Make sure the model is in "test" mode
model.eval()

# Make input tensor require gradient
X.requires_grad_()

saliency = None
##############################################################################
# TODO: Implement this function. Perform a forward and backward pass through #
# the model to compute the gradient of the correct class score with respect #
# to each input image. You first want to compute the loss over the correct #
# scores (we'll combine losses across a batch by summing), and then compute #
# the gradients with a backward pass. #
##############################################################################
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

#forward
#NxC
scores = model(X)
#N
correct_scores = scores.gather(1,y.view(-1,1)).squeeze()

#backward
correct_scores.backward(torch.ones(correct_scores.size()))

saliency = X.grad
saliency = saliency.abs()
saliency,_ = torch.max(saliency, dim = 1)

# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
##############################################################################
# END OF YOUR CODE #
##############################################################################
return saliency

fooling images

  • 可以生成fooling image,给一个image和一个目标class,我们让gradient一直升高,去让目标的score最大,一直到最后的分类是目标的分类
  • 输入
    • X (1,3,224,224)
    • target_y 在0-1000的范围里面
    • model 预训练的CNN
  • 输出:
    • x_fooling
  • TODO
    • When computing an update step, first normalize the gradient:# dX = learning_rate * g / ||g||_2
    • 需要自己写一个训练的部分
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
def make_fooling_image(X, target_y, model):
"""
Generate a fooling image that is close to X, but that the model classifies
as target_y.

Inputs:
- X: Input image; Tensor of shape (1, 3, 224, 224)
- target_y: An integer in the range [0, 1000)
- model: A pretrained CNN

Returns:
- X_fooling: An image that is close to X, but that is classifed as target_y
by the model.
"""
# Initialize our fooling image to the input image, and make it require gradient
X_fooling = X.clone()
X_fooling = X_fooling.requires_grad_()

learning_rate = 1
##############################################################################
# TODO: Generate a fooling image X_fooling that the model will classify as #
# the class target_y. You should perform gradient ascent on the score of the #
# target class, stopping when the model is fooled. #
# When computing an update step, first normalize the gradient: #
# dX = learning_rate * g / ||g||_2 #
# #
# You should write a training loop. #
# #
# HINT: For most examples, you should be able to generate a fooling image #
# in fewer than 100 iterations of gradient ascent. #
# You can print your progress over iterations to check your algorithm. #
##############################################################################
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

for i in range(100):
scores = model(X_fooling)
index = torch.argmax(scores,dim = 1)

if index[0] == target_y:
break

target_score = scores[0,target_y]
target_score.backward()

grad = X_fooling.grad.data

X_fooling.data += learning_rate * (grad/grad.norm())
X_fooling.grad.zero_()

# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
##############################################################################
# END OF YOUR CODE #
##############################################################################
return X_fooling

class visualization

  • 从一个随机的noise开始然后往目标的class上面增加gradient
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
def create_class_visualization(target_y, model, dtype, **kwargs):
"""
Generate an image to maximize the score of target_y under a pretrained model.

Inputs:
- target_y: Integer in the range [0, 1000) giving the index of the class
- model: A pretrained CNN that will be used to generate the image
- dtype: Torch datatype to use for computations

Keyword arguments:
- l2_reg: Strength of L2 regularization on the image
- learning_rate: How big of a step to take
- num_iterations: How many iterations to use
- blur_every: How often to blur the image as an implicit regularizer
- max_jitter: How much to gjitter the image as an implicit regularizer
- show_every: How often to show the intermediate result
"""
model.type(dtype)
l2_reg = kwargs.pop('l2_reg', 1e-3)
learning_rate = kwargs.pop('learning_rate', 25)
num_iterations = kwargs.pop('num_iterations', 100)
blur_every = kwargs.pop('blur_every', 10)
max_jitter = kwargs.pop('max_jitter', 16)
show_every = kwargs.pop('show_every', 25)

# Randomly initialize the image as a PyTorch Tensor, and make it requires gradient.
img = torch.randn(1, 3, 224, 224).mul_(1.0).type(dtype).requires_grad_()

for t in range(num_iterations):
# Randomly jitter the image a bit; this gives slightly nicer results
ox, oy = random.randint(0, max_jitter), random.randint(0, max_jitter)
img.data.copy_(jitter(img.data, ox, oy))

########################################################################
# TODO: Use the model to compute the gradient of the score for the #
# class target_y with respect to the pixels of the image, and make a #
# gradient step on the image using the learning rate. Don't forget the #
# L2 regularization term! #
# Be very careful about the signs of elements in your code. #
########################################################################
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

scores = model(img)
target_score = scores[0,target_y]

target_score.backward()

grad = img.grad.data
grad -= 2*l2_reg * img.data

img.data += learning_rate * (grad/grad.norm())

img.grad.zero_()

# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
########################################################################
# END OF YOUR CODE #
########################################################################

# Undo the random jitter
img.data.copy_(jitter(img.data, -ox, -oy))

# As regularizer, clamp and periodically blur the image
for c in range(3):
lo = float(-SQUEEZENET_MEAN[c] / SQUEEZENET_STD[c])
hi = float((1.0 - SQUEEZENET_MEAN[c]) / SQUEEZENET_STD[c])
img.data[:, c].clamp_(min=lo, max=hi)
if t % blur_every == 0:
blur_image(img.data, sigma=0.5)

# Periodically show the image
if t == 0 or (t + 1) % show_every == 0 or t == num_iterations - 1:
plt.imshow(deprocess(img.data.clone().cpu()))
class_name = class_names[target_y]
plt.title('%s\nIteration %d / %d' % (class_name, t + 1, num_iterations))
plt.gcf().set_size_inches(4, 4)
plt.axis('off')
plt.show()

return deprocess(img.data.cpu())