CS231nassignment3Vis

Network Visualization (PyTorch)

在这部分用了一个已经在ImageNet上面pretrain过的CNN
用这个CNN来定义一个loss function，然后用这个loss来测量现在的不高兴程度
back的时候计算这个loss对于每个像素的gradient
保持这个model不变，但是在图片上面展示出来gradients的下降，形成让loss最小的图片

这个作业一共分成三个部分：

saliency map：一个比较快的方法来展示这个图片哪个部分影响了net分类的决定
fooling image：扰乱一个图片，让他看起来跟人似的，但是会被误分类
class visualization：形成可以得到最大分类得分的图片

注意这里需要先激活conda，不然在jupter里面torch会报错

事先处理

事先定义了函数preprocess的部分，因为pretrain的时候也是提前进行好了预处理
需要下载下来预处理的模型，这里用的是SqueezeNet，因为这样可以直接在CPU上面形成图片
读取一部分ImageNet里面的图片看一看是什么样子的

saliency maps

saliency告诉我们每个pixel对分类得分的影响
为了计算这个东西，我们需要计算没有正则化之前的score对于正确分类的gradient（具体到每个pixel）
- 比如图片的大小是3xHxW，那么得到的gradient的形状也应该是3xHxW
- 表示的就是这个pixel改变的话对于整个结果改变的影响
- 为了计算，我们取每个gradient的绝对值，然后取三个channel里面的最大值，最后得到的大小是HxW

gather method

就像在assignment1里面选择一个矩阵里面的最大值一样，gather这个方法就是在s.gather(1, y.view(-1, 1)).squeeze()一个N，C的矩阵s里面选择对应的y那个的值然后形成一个行的数组

compute_saliency_map

输入：
- X:输入的图片 (N,3,H,W)
- y:label (N,)
- model:预训练好的模型
输出：
- saliency，大小是（N，H，W）
注意，因为torch这个对象自己本来就已经带着grad了，所以直接求出来就可以了，但是注意需要定义一下backward之后的大小应该是多少

def compute_saliency_maps(X, y, model):
    """
    Compute a class saliency map using the model for images X and labels y.

    Input:
    - X: Input images; Tensor of shape (N, 3, H, W)
    - y: Labels for X; LongTensor of shape (N,)
    - model: A pretrained CNN that will be used to compute the saliency map.

    Returns:
    - saliency: A Tensor of shape (N, H, W) giving the saliency maps for the input
    images.
    """
    # Make sure the model is in "test" mode
    model.eval()
    
    # Make input tensor require gradient
    X.requires_grad_()
    
    saliency = None
    ##############################################################################
    # TODO: Implement this function. Perform a forward and backward pass through #
    # the model to compute the gradient of the correct class score with respect  #
    # to each input image. You first want to compute the loss over the correct   #
    # scores (we'll combine losses across a batch by summing), and then compute  #
    # the gradients with a backward pass.                                        #
    ##############################################################################
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

    #forward
    #NxC
    scores = model(X)
    #N
    correct_scores = scores.gather(1,y.view(-1,1)).squeeze()

    #backward
    correct_scores.backward(torch.ones(correct_scores.size()))
    
    saliency = X.grad
    saliency = saliency.abs()
    saliency,_ = torch.max(saliency, dim = 1)

    # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    ##############################################################################
    #                             END OF YOUR CODE                               #
    ##############################################################################
    return saliency

fooling images

可以生成fooling image，给一个image和一个目标class，我们让gradient一直升高，去让目标的score最大，一直到最后的分类是目标的分类
输入
- X (1,3,224,224)
- target_y 在0-1000的范围里面
- model 预训练的CNN
输出：
- x_fooling
TODO
- When computing an update step, first normalize the gradient:# dX = learning_rate * g / ||g||_2
- 需要自己写一个训练的部分

def make_fooling_image(X, target_y, model):
    """
    Generate a fooling image that is close to X, but that the model classifies
    as target_y.

    Inputs:
    - X: Input image; Tensor of shape (1, 3, 224, 224)
    - target_y: An integer in the range [0, 1000)
    - model: A pretrained CNN

    Returns:
    - X_fooling: An image that is close to X, but that is classifed as target_y
    by the model.
    """
    # Initialize our fooling image to the input image, and make it require gradient
    X_fooling = X.clone()
    X_fooling = X_fooling.requires_grad_()
    
    learning_rate = 1
    ##############################################################################
    # TODO: Generate a fooling image X_fooling that the model will classify as   #
    # the class target_y. You should perform gradient ascent on the score of the #
    # target class, stopping when the model is fooled.                           #
    # When computing an update step, first normalize the gradient:               #
    #   dX = learning_rate * g / ||g||_2                                         #
    #                                                                            #
    # You should write a training loop.                                          #
    #                                                                            #
    # HINT: For most examples, you should be able to generate a fooling image    #
    # in fewer than 100 iterations of gradient ascent.                           #
    # You can print your progress over iterations to check your algorithm.       #
    ##############################################################################
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

    for i in range(100):
        scores = model(X_fooling)
        index = torch.argmax(scores,dim = 1)
        
        if index[0] == target_y:
            break
        
        target_score = scores[0,target_y]
        target_score.backward()
        
        grad = X_fooling.grad.data
        
        X_fooling.data += learning_rate * (grad/grad.norm())
        X_fooling.grad.zero_()

    # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    ##############################################################################
    #                             END OF YOUR CODE                               #
    ##############################################################################
    return X_fooling

class visualization

从一个随机的noise开始然后往目标的class上面增加gradient

def create_class_visualization(target_y, model, dtype, **kwargs):
    """
    Generate an image to maximize the score of target_y under a pretrained model.
    
    Inputs:
    - target_y: Integer in the range [0, 1000) giving the index of the class
    - model: A pretrained CNN that will be used to generate the image
    - dtype: Torch datatype to use for computations
    
    Keyword arguments:
    - l2_reg: Strength of L2 regularization on the image
    - learning_rate: How big of a step to take
    - num_iterations: How many iterations to use
    - blur_every: How often to blur the image as an implicit regularizer
    - max_jitter: How much to gjitter the image as an implicit regularizer
    - show_every: How often to show the intermediate result
    """
    model.type(dtype)
    l2_reg = kwargs.pop('l2_reg', 1e-3)
    learning_rate = kwargs.pop('learning_rate', 25)
    num_iterations = kwargs.pop('num_iterations', 100)
    blur_every = kwargs.pop('blur_every', 10)
    max_jitter = kwargs.pop('max_jitter', 16)
    show_every = kwargs.pop('show_every', 25)

    # Randomly initialize the image as a PyTorch Tensor, and make it requires gradient.
    img = torch.randn(1, 3, 224, 224).mul_(1.0).type(dtype).requires_grad_()

    for t in range(num_iterations):
        # Randomly jitter the image a bit; this gives slightly nicer results
        ox, oy = random.randint(0, max_jitter), random.randint(0, max_jitter)
        img.data.copy_(jitter(img.data, ox, oy))

        ########################################################################
        # TODO: Use the model to compute the gradient of the score for the     #
        # class target_y with respect to the pixels of the image, and make a   #
        # gradient step on the image using the learning rate. Don't forget the #
        # L2 regularization term!                                              #
        # Be very careful about the signs of elements in your code.            #
        ########################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        scores = model(img)
        target_score = scores[0,target_y]
        
        target_score.backward()
        
        grad = img.grad.data
        grad -= 2*l2_reg  * img.data
        
        img.data += learning_rate * (grad/grad.norm())
        
        img.grad.zero_()

        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ########################################################################
        #                             END OF YOUR CODE                         #
        ########################################################################
        
        # Undo the random jitter
        img.data.copy_(jitter(img.data, -ox, -oy))

        # As regularizer, clamp and periodically blur the image
        for c in range(3):
            lo = float(-SQUEEZENET_MEAN[c] / SQUEEZENET_STD[c])
            hi = float((1.0 - SQUEEZENET_MEAN[c]) / SQUEEZENET_STD[c])
            img.data[:, c].clamp_(min=lo, max=hi)
        if t % blur_every == 0:
            blur_image(img.data, sigma=0.5)
        
        # Periodically show the image
        if t == 0 or (t + 1) % show_every == 0 or t == num_iterations - 1:
            plt.imshow(deprocess(img.data.clone().cpu()))
            class_name = class_names[target_y]
            plt.title('%s\nIteration %d / %d' % (class_name, t + 1, num_iterations))
            plt.gcf().set_size_inches(4, 4)
            plt.axis('off')
            plt.show()

    return deprocess(img.data.cpu())