CS231nAssignment3StyleTransfer

target

  • 现在有两张图片,需要产生一些新的图片是一张图片的内容但是是另一张图片的style
  • 首先我们希望可以构建一个loss function,可以连接style和每个不同的image,然后在每个图片的pixel上面降低gradient
  • 在这个里面用squeezeNet(在ImageNet上面pretrain的)来提取图片的feature

预先设定好的函数

  • 因为在这部分直接处理的是jpeg的图片而不是cifar-10的图片了,所以在这部分需要对出片进行预处理
  • 同时需要设定一个dtype = torch.FloatTensor 来设计是用CPU跑还是用GPU跑(GPU的里面会带cuda)
  • CNN = torchvision.models.squeezenet1_1(pretrained=True).features提取squeezenet的model,并且设定CNN的type等于上面设定好的dtype
    • 因为不需要再进行训练了,需要把cnn里面的所有自动计算grad的功能关掉

提取特征

  • 输入
    • x,一个tensor,大小是(N,C,H,W),里面是一个minibatch的数据
    • cnn,刚才载入好的model
  • 输出
    • features,一个list,features[i]的大小是(N,C_i,H_i,W_i)
      • 不同层得到的feature会有不同的channel的数量以及H和W的大小
  • 实现:
    • 在具体的代码实现里面,直接用value得到每一层之后的结果,下一层的输入就是上一层得到的结果
      1
      2
      3
      4
      5
      6
      7
      8
      9
      def extract_features(x, cnn):

      features = []
      prev_feat = x
      for i, module in enumerate(cnn._modules.values()):
      next_feat = module(prev_feat)
      features.append(next_feat)
      prev_feat = next_feat
      return features

计算loss

loss一共由三个部分组成,分别是:图片content的loss + style的loss + total var loss

  • 我们这个东西的目的是用一张图片的内容和另一个图片的style
    • 当内容偏离了content图片的content,style偏离了stype图片的时候就需要penalize(处罚)
    • 为了实现这个功能,我们需要用hybrid的loss,并且不是在weights上面调参,而是在每张图片的pixel上面调整

content loss

  • 这个函数衡量生成的图片的feature map和原来作为content的图片偏离多少
  • 我们只关心这个network里面的一层的表示,这一层会有自己特定的channel数量以及filter的大小
  • 我们需要把这个feature map reshape,把所有的空间位置组合到同一个维度上面
  • 但是在实际的实现上面,我们不需要再reshape了,因为大小可以直接对应处理了
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
def content_loss(content_weight, content_current, content_original):
"""
Compute the content loss for style transfer.

Inputs:
- content_weight: Scalar giving the weighting for the content loss.
- content_current: features of the current image; this is a PyTorch Tensor of shape
(1, C_l, H_l, W_l).
- content_target: features of the content image, Tensor with shape (1, C_l, H_l, W_l).

Returns:
- scalar content loss
"""
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

return content_weight * torch.sum((content_original - content_current)**2)

# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

style loss

对于一个给定的层layer,定义loss

  • 计算Gram Mat,G,表示不同filter的相关性。这个矩阵是个协方差矩阵,我们希望形成的图片的activation 统计和style图片的可以match,计算这两个的协方差就是一个办法(并且经过验证效果比较好)
  • 给定一个feature map,G矩阵的形状应该是(Cl,Cl)。Cl是这一层的filter的数量。里面的元素应该等于两个filter的乘积
  • 把生成图片的G和style图片的G做差,平方和就是一层的loss
  • 所有层的loss加在一起就是总共的loss

  • G Mat implement

    • view(),形成一个内容相同但是大小不同的tensor
    • .matmul 两个tensor相乘
    • .permute 给tensor里面的维度换位
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      def gram_matrix(features, normalize=True):
      """
      Compute the Gram matrix from features.

      Inputs:
      - features: PyTorch Tensor of shape (N, C, H, W) giving features for
      a batch of N images.
      - normalize: optional, whether to normalize the Gram matrix
      If True, divide the Gram matrix by the number of neurons (H * W * C)

      Returns:
      - gram: PyTorch Tensor of shape (N, C, C) giving the
      (optionally normalized) Gram matrices for the N input images.
      """
      # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

      N,C,H,W = features.size()

      # N,C,M
      features = features.view(N,C,H*W)

      # N,C,M x N,M,C -> N,C,C
      gram = features.matmul(features.permute(0,2,1))

      if normalize==True:
      gram /= (H*W*C)

      return gram


      # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
  • loss implement

    • 输入
      • feats:现在图片的每一层的feature,从上面的提取特征函数得到
      • style_layers:indices
      • style_targets:和上面的长度相同,计算的是第i层原图片得到的G Mat
      • style_weights:scalar
    • 在计算的时候只需要考虑每一层里面计算出来的现在的G Mat(注意索引不是i)和原图片的G,和上面一样的计算就可以了
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
# Now put it together in the style_loss function...
def style_loss(feats, style_layers, style_targets, style_weights):
"""
Computes the style loss at a set of layers.

Inputs:
- feats: list of the features at every layer of the current image, as produced by
the extract_features function.
- style_layers: List of layer indices into feats giving the layers to include in the
style loss.
- style_targets: List of the same length as style_layers, where style_targets[i] is
a PyTorch Tensor giving the Gram matrix of the source style image computed at
layer style_layers[i].
- style_weights: List of the same length as style_layers, where style_weights[i]
is a scalar giving the weight for the style loss at layer style_layers[i].

Returns:
- style_loss: A PyTorch Tensor holding a scalar giving the style loss.
"""
# Hint: you can do this with one for loop over the style layers, and should
# not be very much code (~5 lines). You will need to use your gram_matrix function.
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

loss = torch.tensor(0.).type(dtype)

for i in range(len(style_layers)):
G_Mat = gram_matrix(feats[style_layers[i]])
loss_layer = style_weights[i] * torch.sum((style_targets[i] - G_Mat)**2)
loss += loss_layer

return loss

# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

total-variation reg

  • 为了让图片显示的内容更加平滑,加入了这个惩罚部分
  • 计算的方法可以是计算每个像素和它相邻像素的差的平方和(相邻像素分别包括垂直和水平)
  • 需要让结果vec化,直接用-1把矩阵错位一个
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
def tv_loss(img, tv_weight):
"""
Compute total variation loss.

Inputs:
- img: PyTorch Variable of shape (1, 3, H, W) holding an input image.
- tv_weight: Scalar giving the weight w_t to use for the TV loss.

Returns:
- loss: PyTorch Variable holding a scalar giving the total variation loss
for img weighted by tv_weight.
"""
# Your implementation should be vectorized and not require any loops!
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

H_var = torch.sum((img[:,:,1:,:] - img[:,:,:-1,:])**2)
W_var = torch.sum((img[:,:,:,1:] - img[:,:,:,:-1])**2)

return (H_var + W_var) * tv_weight

# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

已经写好了转化style的函数

  • 首先提取content和style图片的特征
  • 然后初始化需要生成的图片,这张图片上面需要打开grad
  • 设置好hyper,设定好optimizer
  • 然后在一定的范围里,用cnn提取现在图片的特征
  • 用现在的特征计算loss,然后改变现在的图片