这部分需要在torch和TensorFlow两个framework里面选一个。

PyTorch

What

加入了Tensor的object（类似于narray），不需要手动的backprop了

Why

在GPU上面跑，不需要CUDA就可以在自己的GPU上面跑NN
functions很多
站在巨人的肩膀上！
在实际使用中应该写的深度学习代码

学习资料

Justin Johnson has made an excellenttutorial for PyTorch.
DetailedAPI doc
If you have other questions that are not addressed by the API docs, the PyTorch forum is a much better place to ask than StackOverflow.

整体结构

第一部分，准备，使用dataset
第二部分，abstraction level1，直接在最底层的Tensors上面操作
第三部分，abstraction level2，nn.Module定义一个任意的NN结构
第四部分，abstraction level3，nn.Sequential，定义一个简单的线性feed - back网络
第五部分，自己调参，尽量让CIFAR - 10的精度尽可能高

Part 1.Preparation

pytorch里面有下载dataset，预处理并且迭代成minibatch的功能

import torchvision.transforms as T

这个包包括了预处理以及增强data的功能，在这里选择了减去平均的RGB并且除以标准差
然后对不同的部分分别构建了一个dataset object（训练，测试，val），这个dataset会载入一次training example，并且在DataLoader部分构建minibatch

NUM_TRAIN = 49000

# The torchvision.transforms package provides tools for preprocessing data
# and for performing data augmentation; here we set up a transform to
# preprocess the data by subtracting the mean RGB value and dividing by the
# standard deviation of each RGB value; we've hardcoded the mean and std.
transform = T.Compose([
    T.ToTensor(),
    T.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
])

# We set up a Dataset object for each split (train / val / test); Datasets load
# training examples one at a time, so we wrap each Dataset in a DataLoader which
# iterates through the Dataset and forms minibatches. We divide the CIFAR-10
# training set into train and val sets by passing a Sampler object to the
# DataLoader telling how it should sample from the underlying Dataset.
cifar10_train = dset.CIFAR10('./cs231n/datasets', train=True, download=True,
                             transform=transform)
loader_train = DataLoader(cifar10_train, batch_size=64,
                          sampler=sampler.SubsetRandomSampler(range(NUM_TRAIN)))

cifar10_val = dset.CIFAR10('./cs231n/datasets', train=True, download=True,
                           transform=transform)
loader_val = DataLoader(cifar10_val, batch_size=64,
                        sampler=sampler.SubsetRandomSampler(range(NUM_TRAIN, 50000)))

cifar10_test = dset.CIFAR10('./cs231n/datasets', train=False, download=True,
                            transform=transform)
loader_test = DataLoader(cifar10_test, batch_size=64)

需要一个是否使用GPU的flag，并且set到true。在这个作业里面不是必须用GPU跑，但是如果电脑不能enableCUDA的话，就会自动返回CPU模式。
除此之外，建立了两个global var，dtype代表float32，device代表用哪个
因为mac本身不支持CUDA，而且好像新版本的系统还不能安装N卡的部分，所以现在用的CPU

USE_GPU = True

dtype = torch.float32  # we will be using float throughout this tutorial

if USE_GPU and torch.cuda.is_available():
    device = torch.device('cuda')
else:
    device = torch.device('cpu')

# Constant to control how frequently we print train loss
print_every = 100

print('using device:', device)

Part2 Barebones PyTorch

虽然有很多高层的API已经有了很多功能，但是这部分从比较底层的部分来进行
建立一个简单的fc - relu net，两个中间层，没有bias
用Tensor的method来计算forward，并且用自带的autograd来计算back
如果设定了requires_grad = True，那么在计算的时候不仅会计算值，还会生成计算back的graph
if x is a Tensor with x.requires_grad == True then after backpropagation x.grad will be another Tensor holding the gradient of x with respect to the scalar loss at the end

PyTorch Tensors: Flatten Function

Tensors是一个和narray很像的东西，定义了很多比较好用的功能，比如flatten来reshape image data
在Tensor里面一个图片的形状是NxCxHxW
- datapoint的数量
- channels
- feature map的H和W
但是在affine里面我们希望一个datapoint可以表现成一个单独的vector，而不是channel和宽和高
所以在这里用flatten来首先读取NCHW的数据，然后返回这个data的view（相当于array里面的reshape，把它改成了Nx？？，其中？？可以是任何值）



def flatten(x):
    N = x.shape[0]  # read in N, C, H, W
    # "flatten" the C * H * W values into a single vector per image
    return x.view(N, -1)

Barebones PyTorch: Two-Layer Network

当定义一个 two_layer_fc的时候，会有两层的中间带relu的forward，在写好了forward之后需要确保输出的形状是对的并且没有什么问题(最近好像对这个大小已经没有什么疑问了)

import torch.nn.functional as F  # useful stateless functions


def two_layer_fc(x, params):
    """
    A fully-connected neural networks; the architecture is:
    NN is fully connected -> ReLU -> fully connected layer.
    Note that this function only defines the forward pass; 
    PyTorch will take care of the backward pass for us.

    The input to the network will be a minibatch of data, of shape
    (N, d1, ..., dM) where d1 * ... * dM = D. The hidden layer will have H units,
    and the output layer will produce scores for C classes.

    Inputs:
    - x: A PyTorch Tensor of shape (N, d1, ..., dM) giving a minibatch of
      input data.
    - params: A list [w1, w2] of PyTorch Tensors giving weights for the network;
      w1 has shape (D, H) and w2 has shape (H, C).

    Returns:
    - scores: A PyTorch Tensor of shape (N, C) giving classification scores for
      the input data x.
    """
    # first we flatten the image
    x = flatten(x)  # shape: [batch_size, C x H x W]

    w1, w2 = params

    # Forward pass: compute predicted y using operations on Tensors. Since w1 and
    # w2 have requires_grad=True, operations involving these Tensors will cause
    # PyTorch to build a computational graph, allowing automatic computation of
    # gradients. Since we are no longer implementing the backward pass by hand we
    # don't need to keep references to intermediate values.
    # you can also use `.clamp(min=0)`, equivalent to F.relu()
    x = F.relu(x.mm(w1))
    x = x.mm(w2)
    return x


def two_layer_fc_test():
    hidden_layer_size = 42
    # minibatch size 64, feature dimension 50
    x = torch.zeros((64, 50), dtype=dtype)
    w1 = torch.zeros((50, hidden_layer_size), dtype=dtype)
    w2 = torch.zeros((hidden_layer_size, 10), dtype=dtype)
    scores = two_layer_fc(x, [w1, w2])
    print(scores.size())  # you should see [64, 10]


two_layer_fc_test()

Barebones PyTorch: Three-Layer ConvNet

上下这两个都是，在测试的时候可以直接pass 0来测试tensor的大小是不是对的
网络的结构
- conv with bias，channel_1 filters，KW1xKH1，2 zero - padding
- RELU
- conv with bias，channel_2 filters，KW2xKH2，1 zero - padding
- RELU
- fc with bias，输出C class
注意！在这里fc之后没有softmax的激活层，因为在后面计算loss的时候会提供softmax，计算起来更加有效率
注意2！在conv2d之前不需要flatten，在fc之前才需要flatten



def three_layer_convnet(x, params):
    """
    Performs the forward pass of a three-layer convolutional network with the
    architecture defined above.

    Inputs:
    - x: A PyTorch Tensor of shape (N, 3, H, W) giving a minibatch of images
    - params: A list of PyTorch Tensors giving the weights and biases for the
      network; should contain the following:
      - conv_w1: PyTorch Tensor of shape (channel_1, 3, KH1, KW1) giving weights
        for the first convolutional layer
      - conv_b1: PyTorch Tensor of shape (channel_1,) giving biases for the first
        convolutional layer
      - conv_w2: PyTorch Tensor of shape (channel_2, channel_1, KH2, KW2) giving
        weights for the second convolutional layer
      - conv_b2: PyTorch Tensor of shape (channel_2,) giving biases for the second
        convolutional layer
      - fc_w: PyTorch Tensor giving weights for the fully-connected layer. Can you
        figure out what the shape should be? (N,channel_2*H*W)
      - fc_b: PyTorch Tensor giving biases for the fully-connected layer. Can you
        figure out what the shape should be? (C,)

    Returns:
    - scores: PyTorch Tensor of shape (N, C) giving classification scores for x
    """
    conv_w1, conv_b1, conv_w2, conv_b2, fc_w, fc_b = params
    scores = None
    ################################################################################
    # TODO: Implement the forward pass for the three-layer ConvNet.                #
    ################################################################################
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

    x = nn.functional.conv2d(x, conv_w1, bias=conv_b1, padding=2)
    x = nn.functional.conv2d(F.relu(x), conv_w2, bias=conv_b2, padding=1)
    x = flatten(x)
    x = x.mm(fc_w) + fc_b
    scores = x

    # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    ################################################################################
    #                                 END OF YOUR CODE                             #
    ################################################################################
    return scores

Barebones PyTorch: Initialization

random_weight(shape) initializes a weight tensor with the Kaiming normalization method. -> 使用了KAIMING normal
zero_weight(shape) initializes a weight tensor with all zeros. Useful for instantiating bias parameters.



def random_weight(shape):
    """
    Create random Tensors for weights; setting requires_grad=True means that we
    want to compute gradients for these Tensors during the backward pass.
    We use Kaiming normalization: sqrt(2 / fan_in)
    """
    if len(shape) == 2:  # FC weight
        fan_in = shape[0]
    else:
        # conv weight [out_channel, in_channel, kH, kW]
        fan_in = np.prod(shape[1:])
    # randn is standard normal distribution generator.
    w = torch.randn(shape, device=device, dtype=dtype) * np.sqrt(2. / fan_in)
    w.requires_grad = True
    return w


def zero_weight(shape):
    return torch.zeros(shape, device=device, dtype=dtype, requires_grad=True)


# create a weight of shape [3 x 5]
# you should see the type `torch.cuda.FloatTensor` if you use GPU.
# Otherwise it should be `torch.FloatTensor`
random_weight((3, 5))

Barebones PyTorch: Check Accuracy

在这部分不需要计算grad，所以要关上torch.no_grad()避免浪费
输入
- 一个DataLoader来给我们想要check的data分块
- 一个表示模型到底是什么样子的model_fn，来计算预测的scores
- 这个model需要的参数
没有返回值但是会print出来acc



def check_accuracy_part2(loader, model_fn, params):
    """
    Check the accuracy of a classification model.

    Inputs:
    - loader: A DataLoader for the data split we want to check
    - model_fn: A function that performs the forward pass of the model,
      with the signature scores = model_fn(x, params)
    - params: List of PyTorch Tensors giving parameters of the model

    Returns: Nothing, but prints the accuracy of the model
    """
    split = 'val' if loader.dataset.train else 'test'
    print('Checking accuracy on the %s set' % split)
    num_correct, num_samples = 0, 0
    with torch.no_grad():
        for x, y in loader:
            x = x.to(device=device, dtype=dtype)  # move to device, e.g. GPU
            y = y.to(device=device, dtype=torch.int64)
            scores = model_fn(x, params)
            _, preds = scores.max(1)
            num_correct += (preds == y).sum()
            num_samples += preds.size(0)
        acc = float(num_correct) / num_samples
        print('Got %d / %d correct (%.2f%%)' %
              (num_correct, num_samples, 100 * acc))

BareBones PyTorch: Training Loop

用stochastic gradient descent without momentum来train，并且用torch.functional.cross_entropy来计算loss
输入
- model_fc
- params
- learning_rate
没有输出
进行的操作
- 把data移动到GPU或者CPU
- 计算score和loss
- loss.backward()
- update params，这部分不需要计算grad

BareBones PyTorch: Training a ConvNet

需要网络
1. Convolutional layer(with bias) with 32 5x5 filters, with zero - padding of 2
2. ReLU
3. Convolutional layer(with bias) with 16 3x3 filters, with zero - padding of 1
4. ReLU
5. Fully - connected layer(with bias) to compute scores for 10 classes
需要自己初始化参数，不需要tune hypers
- 注意1：fc的w的大小是D,C，跟数据无关需要从上一层的输出求
- conv之后的图片大小从32-> 30

learning_rate = 3e-3

channel_1 = 32
channel_2 = 16

conv_w1 = None
conv_b1 = None
conv_w2 = None
conv_b2 = None
fc_w = None
fc_b = None

################################################################################
# TODO: Initialize the parameters of a three-layer ConvNet.                    #
################################################################################
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

conv_w1 = random_weight((channel_1, 3, 5, 5))
conv_b1 = zero_weight(channel_1)
conv_w2 = random_weight((channel_2, channel_1, 5, 5))
conv_b2 = zero_weight(channel_2)
fc_w = random_weight((channel_2 * 30 * 30, 10))
fc_b = zero_weight(10)

# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
################################################################################
#                                 END OF YOUR CODE                             #
################################################################################

params = [conv_w1, conv_b1, conv_w2, conv_b2, fc_w, fc_b]
train_part2(three_layer_convnet, params, learning_rate)

Part3 PyTorch Module API

上面的所有过程是手算来track整个过程的，但是在更大的net里面就没有什么用了
nn.Module来定义网络，并且可以选optmi的方法

Subclass nn.Module. Give your network class an intuitive name like TwoLayerFC.
__init__()里面定义自己需要的所有层. nn.Linear and nn.Conv2d 都在模块里自带了. nn.Module will track these internal parameters for you. Refer to the doc to learn more about the dozens of builtin layers. Warning: don’t forget to call the super().__init__() first!（调用父类）
In the forward() method, define the connectivity of your network. 直接用init里面初始化好的方法来forward，不要再forward里面增加新的方法

用上面的方法来写一个三层的layer

注意需要初始化w和b的参数，用kaiming的方法

class ThreeLayerConvNet(nn.Module):
    def __init__(self, in_channel, channel_1, channel_2, num_classes):
        super().__init__()
        ########################################################################
        # TODO: Set up the layers you need for a three-layer ConvNet with the  #
        # architecture defined above.                                          #
        ########################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        self.conv_1 = nn.Conv2d(in_channel,channel_1,5,stride=1, padding=2,bias=True)
        nn.init.kaiming_normal_(self.conv_1.weight)
        nn.init.constant_(self.conv_1.bias, 0)
        self.conv_2 = nn.Conv2d(channel_1,channel_2,3,stride=1, padding=1,bias=True)
        nn.init.kaiming_normal_(self.conv_2.weight)
        nn.init.constant_(self.conv_2.bias, 0)
        self.fc_3 = nn.Linear(channel_2 * 32 * 32 , num_classes)
        nn.init.kaiming_normal_(self.fc_3.weight)
        nn.init.constant_(self.fc_3.bias, 0)

        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ########################################################################
        #                          END OF YOUR CODE                            #       
        ########################################################################

    def forward(self, x):
        scores = None
        ########################################################################
        # TODO: Implement the forward function for a 3-layer ConvNet. you      #
        # should use the layers you defined in __init__ and specify the        #
        # connectivity of those layers in forward()                            #
        ########################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        x = self.conv_1(x)
        x = self.conv_2(F.relu(x))
        x = flatten(F.relu(x))
        x = self.fc_3(x)
        scores = x

        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ########################################################################
        #                             END OF YOUR CODE                         #
        ########################################################################
        return scores


def test_ThreeLayerConvNet():
    x = torch.zeros((64, 3, 32, 32), dtype=dtype)  # minibatch size 64, image size [3, 32, 32]
    model = ThreeLayerConvNet(in_channel=3, channel_1=12, channel_2=8, num_classes=10)
    scores = model(x)
    print(scores.size())  # you should see [64, 10]
test_ThreeLayerConvNet()

Module API: Check Accuracy

不用手动pass参数了，直接就可以得到整个net的acc

Module API: Training Loop

用optimizer这个object来update weights
输入
- model
- optimizer
- epoch，可选
没有return，但是会打印出来training时候的acc

其实就是设置好model和optimizer就可以了

Part4 PyTorch Sequential API

nn.Sequential没有上面的灵活，但是可以集成上面的一串功能
需要提前定义一个在forward里面能用的flatten

# We need to wrap `flatten` function in a module in order to stack it
# in nn.Sequential
class Flatten(nn.Module):
    def forward(self, x):
        return flatten(x)

hidden_layer_size = 4000
learning_rate = 1e-2

model = nn.Sequential(
    Flatten(),
    nn.Linear(3 * 32 * 32, hidden_layer_size),
    nn.ReLU(),
    nn.Linear(hidden_layer_size, 10),
)

# you can use Nesterov momentum in optim.SGD
optimizer = optim.SGD(model.parameters(), lr=learning_rate,
                     momentum=0.9, nesterov=True)

train_part34(model, optimizer)

实现三层，注意需要初始化参数

这里遇到了一个问题是当用random_weight实现的时候，acc会特别低
从这里发现可以重新定义另一个计算方法不同的weights
从这里得知如何给module增加新的function

def xavier_normal(shape):
    """
    Create random Tensors for weights; setting requires_grad=True means that we
    want to compute gradients for these Tensors during the backward pass.
    We use Xavier normalization: sqrt(2 / (fan_in + fan_out))
    """
    if len(shape) == 2:  # FC weight
        fan_in = shape[1]
        fan_out = shape[0]
    else:
        fan_in = np.prod(shape[1:]) # conv weight [out_channel, in_channel, kH, kW]
        fan_out = shape[0] * shape[2] * shape[3]
    # randn is standard normal distribution generator. 
    w = torch.randn(shape, device=device, dtype=dtype) * np.sqrt(2. / (fan_in + fan_out))
    w.requires_grad = True
    return w

channel_1 = 32
channel_2 = 16
learning_rate = 1e-2

model = None
optimizer = None

################################################################################
# TODO: Rewrite the 2-layer ConvNet with bias from Part III with the           #
# Sequential API.                                                              #
################################################################################
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****


model = nn.Sequential(
    nn.Conv2d(3, channel_1,5,stride = 1,padding = 2),
    nn.ReLU(),
    nn.Conv2d(channel_1, channel_2,3,stride = 1,padding = 1),
    nn.ReLU(),
    Flatten(),
    nn.Linear(32*32*channel_2, 10),
)

def init_weights(m):
    print(m)
    if type(m) == nn.Linear:
        m.weight.data = xavier_normal(m.weight.size())
        m.bias.data = zero_weight(m.bias.size())

model.apply(init_weights)

optimizer = optim.SGD(model.parameters(), lr=learning_rate,
                     momentum=0.9, nesterov=True)

# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
################################################################################
#                                 END OF YOUR CODE                             
################################################################################

train_part34(model, optimizer)

Part5 来训练CIFAR-10吧！

自己找net的结构，hyper，loss，optimizers来把CIFAR-10的val_acc在10个epoch之内升到70%以上！

Layers in torch.nn package: http://pytorch.org/docs/stable/nn.html
Activations: http://pytorch.org/docs/stable/nn.html#non-linear-activations
Loss functions: http://pytorch.org/docs/stable/nn.html#loss-functions
Optimizers: http://pytorch.org/docs/stable/optim.html

一些可能的方法：

Filter size: Above we used 5x5; would smaller filters be more efficient?
Number of filters: Above we used 32 filters. Do more or fewer do better?
Pooling vs Strided Convolution: Do you use max pooling or just stride convolutions?
Batch normalization: Try adding spatial batch normalization after convolution layers and vanilla batch normalization after affine layers. Do your networks train faster?
Network architecture: The network above has two layers of trainable parameters. Can you do better with a deep network? Good architectures to try include:
- [conv-relu-pool]xN -> [affine]xM -> [softmax or SVM]
- [conv-relu-conv-relu-pool]xN -> [affine]xM -> [softmax or SVM]
- [batchnorm-relu-conv]xN -> [affine]xM -> [softmax or SVM]
Global Average Pooling: Instead of flattening and then having multiple affine layers, perform convolutions until your image gets small (7x7 or so) and then perform an average pooling operation to get to a 1x1 image picture (1, 1 , Filter#), which is then reshaped into a (Filter#) vector. This is used in Google’s Inception Network (See Table 1 for their architecture).
Regularization: Add l2 weight regularization, or perhaps use Dropout.

一些tips：

应该会在几百个iter里面就看到进步，如果params work well
tune hyper的时候从一大片range和小的train开始，找到好一些的之后再围绕这个范围找（多训一点）
在找hyper的时候应该用val set

model = None
optimizer = None

# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

channel_1 = 16
channel_2 = 32
channel_3 = 64
channel_4 = 64

fc_1 = 1024
num_classes = 10


model = nn.Sequential(
    nn.Conv2d(3, channel_1,3,stride = 1,padding = 1),
    nn.BatchNorm2d(channel_1),
    nn.ReLU(),
    nn.MaxPool2d(kernel_size = 2),
    nn.Conv2d(channel_1, channel_2,3,stride = 1,padding = 1),
    nn.BatchNorm2d(channel_2),
    nn.ReLU(),
    nn.MaxPool2d(kernel_size = 2),
    nn.Conv2d(channel_2, channel_3,3,stride = 1,padding = 1),
    nn.BatchNorm2d(channel_3),
    nn.ReLU(),
    nn.Conv2d(channel_3, channel_4,3,stride = 1,padding = 1),
    nn.BatchNorm2d(channel_4),
    nn.ReLU(),
    nn.MaxPool2d(kernel_size = 2),
    Flatten(),
    nn.Linear(4*4*channel_4, num_classes)
#     nn.Linear(fc_1,num_classes)
    )


learning_rate = 1e-3
optimizer = optim.Adam(model.parameters(), lr=learning_rate,
                     betas=(0.9, 0.999), eps=1e-08, weight_decay=0)

# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
################################################################################
#                                 END OF YOUR CODE                             
################################################################################

# You should get at least 70% accuracy
train_part34(model, optimizer, epochs=10)

第四层conv试过ksize=1，效果不是很好
BN好像效果很好
maxpool多一些，计算负担少而且效果好像比较好
最终val_acc在77-79左右，test_acc = 76.22