CS231Nassignment2之Pytorch

Posted on 2019-05-07 | In 图像处理 , Deep Learning , CS231n作业

这部分需要在torch和TensorFlow两个framework里面选一个。

PyTorch

What

加入了Tensor的object（类似于narray），不需要手动的backprop了

Why

在GPU上面跑，不需要CUDA就可以在自己的GPU上面跑NN
functions很多
站在巨人的肩膀上！
在实际使用中应该写的深度学习代码

学习资料

Justin Johnson has made an excellenttutorial for PyTorch.
DetailedAPI doc
If you have other questions that are not addressed by the API docs, the PyTorch forum is a much better place to ask than StackOverflow.

整体结构

第一部分，准备，使用dataset
第二部分，abstraction level1，直接在最底层的Tensors上面操作
第三部分，abstraction level2，nn.Module定义一个任意的NN结构
第四部分，abstraction level3，nn.Sequential，定义一个简单的线性feed - back网络
第五部分，自己调参，尽量让CIFAR - 10的精度尽可能高

Part 1.Preparation

pytorch里面有下载dataset，预处理并且迭代成minibatch的功能

import torchvision.transforms as T

这个包包括了预处理以及增强data的功能，在这里选择了减去平均的RGB并且除以标准差
然后对不同的部分分别构建了一个dataset object（训练，测试，val），这个dataset会载入一次training example，并且在DataLoader部分构建minibatch

NUM_TRAIN = 49000

# The torchvision.transforms package provides tools for preprocessing data
# and for performing data augmentation; here we set up a transform to
# preprocess the data by subtracting the mean RGB value and dividing by the
# standard deviation of each RGB value; we've hardcoded the mean and std.
transform = T.Compose([
    T.ToTensor(),
    T.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
])

# We set up a Dataset object for each split (train / val / test); Datasets load
# training examples one at a time, so we wrap each Dataset in a DataLoader which
# iterates through the Dataset and forms minibatches. We divide the CIFAR-10
# training set into train and val sets by passing a Sampler object to the
# DataLoader telling how it should sample from the underlying Dataset.
cifar10_train = dset.CIFAR10('./cs231n/datasets', train=True, download=True,
                             transform=transform)
loader_train = DataLoader(cifar10_train, batch_size=64,
                          sampler=sampler.SubsetRandomSampler(range(NUM_TRAIN)))

cifar10_val = dset.CIFAR10('./cs231n/datasets', train=True, download=True,
                           transform=transform)
loader_val = DataLoader(cifar10_val, batch_size=64,
                        sampler=sampler.SubsetRandomSampler(range(NUM_TRAIN, 50000)))

cifar10_test = dset.CIFAR10('./cs231n/datasets', train=False, download=True,
                            transform=transform)
loader_test = DataLoader(cifar10_test, batch_size=64)

需要一个是否使用GPU的flag，并且set到true。在这个作业里面不是必须用GPU跑，但是如果电脑不能enableCUDA的话，就会自动返回CPU模式。
除此之外，建立了两个global var，dtype代表float32，device代表用哪个
因为mac本身不支持CUDA，而且好像新版本的系统还不能安装N卡的部分，所以现在用的CPU

USE_GPU = True

dtype = torch.float32  # we will be using float throughout this tutorial

if USE_GPU and torch.cuda.is_available():
    device = torch.device('cuda')
else:
    device = torch.device('cpu')

# Constant to control how frequently we print train loss
print_every = 100

print('using device:', device)

Part2 Barebones PyTorch

虽然有很多高层的API已经有了很多功能，但是这部分从比较底层的部分来进行
建立一个简单的fc - relu net，两个中间层，没有bias
用Tensor的method来计算forward，并且用自带的autograd来计算back
如果设定了requires_grad = True，那么在计算的时候不仅会计算值，还会生成计算back的graph
if x is a Tensor with x.requires_grad == True then after backpropagation x.grad will be another Tensor holding the gradient of x with respect to the scalar loss at the end

PyTorch Tensors: Flatten Function

Tensors是一个和narray很像的东西，定义了很多比较好用的功能，比如flatten来reshape image data
在Tensor里面一个图片的形状是NxCxHxW
- datapoint的数量
- channels
- feature map的H和W
但是在affine里面我们希望一个datapoint可以表现成一个单独的vector，而不是channel和宽和高
所以在这里用flatten来首先读取NCHW的数据，然后返回这个data的view（相当于array里面的reshape，把它改成了Nx？？，其中？？可以是任何值）



def flatten(x):
    N = x.shape[0]  # read in N, C, H, W
    # "flatten" the C * H * W values into a single vector per image
    return x.view(N, -1)

Barebones PyTorch: Two-Layer Network

当定义一个 two_layer_fc的时候，会有两层的中间带relu的forward，在写好了forward之后需要确保输出的形状是对的并且没有什么问题(最近好像对这个大小已经没有什么疑问了)

import torch.nn.functional as F  # useful stateless functions


def two_layer_fc(x, params):
    """
    A fully-connected neural networks; the architecture is:
    NN is fully connected -> ReLU -> fully connected layer.
    Note that this function only defines the forward pass; 
    PyTorch will take care of the backward pass for us.

    The input to the network will be a minibatch of data, of shape
    (N, d1, ..., dM) where d1 * ... * dM = D. The hidden layer will have H units,
    and the output layer will produce scores for C classes.

    Inputs:
    - x: A PyTorch Tensor of shape (N, d1, ..., dM) giving a minibatch of
      input data.
    - params: A list [w1, w2] of PyTorch Tensors giving weights for the network;
      w1 has shape (D, H) and w2 has shape (H, C).

    Returns:
    - scores: A PyTorch Tensor of shape (N, C) giving classification scores for
      the input data x.
    """
    # first we flatten the image
    x = flatten(x)  # shape: [batch_size, C x H x W]

    w1, w2 = params

    # Forward pass: compute predicted y using operations on Tensors. Since w1 and
    # w2 have requires_grad=True, operations involving these Tensors will cause
    # PyTorch to build a computational graph, allowing automatic computation of
    # gradients. Since we are no longer implementing the backward pass by hand we
    # don't need to keep references to intermediate values.
    # you can also use `.clamp(min=0)`, equivalent to F.relu()
    x = F.relu(x.mm(w1))
    x = x.mm(w2)
    return x


def two_layer_fc_test():
    hidden_layer_size = 42
    # minibatch size 64, feature dimension 50
    x = torch.zeros((64, 50), dtype=dtype)
    w1 = torch.zeros((50, hidden_layer_size), dtype=dtype)
    w2 = torch.zeros((hidden_layer_size, 10), dtype=dtype)
    scores = two_layer_fc(x, [w1, w2])
    print(scores.size())  # you should see [64, 10]


two_layer_fc_test()

Barebones PyTorch: Three-Layer ConvNet

上下这两个都是，在测试的时候可以直接pass 0来测试tensor的大小是不是对的
网络的结构
- conv with bias，channel_1 filters，KW1xKH1，2 zero - padding
- RELU
- conv with bias，channel_2 filters，KW2xKH2，1 zero - padding
- RELU
- fc with bias，输出C class
注意！在这里fc之后没有softmax的激活层，因为在后面计算loss的时候会提供softmax，计算起来更加有效率
注意2！在conv2d之前不需要flatten，在fc之前才需要flatten



def three_layer_convnet(x, params):
    """
    Performs the forward pass of a three-layer convolutional network with the
    architecture defined above.

    Inputs:
    - x: A PyTorch Tensor of shape (N, 3, H, W) giving a minibatch of images
    - params: A list of PyTorch Tensors giving the weights and biases for the
      network; should contain the following:
      - conv_w1: PyTorch Tensor of shape (channel_1, 3, KH1, KW1) giving weights
        for the first convolutional layer
      - conv_b1: PyTorch Tensor of shape (channel_1,) giving biases for the first
        convolutional layer
      - conv_w2: PyTorch Tensor of shape (channel_2, channel_1, KH2, KW2) giving
        weights for the second convolutional layer
      - conv_b2: PyTorch Tensor of shape (channel_2,) giving biases for the second
        convolutional layer
      - fc_w: PyTorch Tensor giving weights for the fully-connected layer. Can you
        figure out what the shape should be? (N,channel_2*H*W)
      - fc_b: PyTorch Tensor giving biases for the fully-connected layer. Can you
        figure out what the shape should be? (C,)

    Returns:
    - scores: PyTorch Tensor of shape (N, C) giving classification scores for x
    """
    conv_w1, conv_b1, conv_w2, conv_b2, fc_w, fc_b = params
    scores = None
    ################################################################################
    # TODO: Implement the forward pass for the three-layer ConvNet.                #
    ################################################################################
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

    x = nn.functional.conv2d(x, conv_w1, bias=conv_b1, padding=2)
    x = nn.functional.conv2d(F.relu(x), conv_w2, bias=conv_b2, padding=1)
    x = flatten(x)
    x = x.mm(fc_w) + fc_b
    scores = x

    # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    ################################################################################
    #                                 END OF YOUR CODE                             #
    ################################################################################
    return scores

Barebones PyTorch: Initialization

random_weight(shape) initializes a weight tensor with the Kaiming normalization method. -> 使用了KAIMING normal
zero_weight(shape) initializes a weight tensor with all zeros. Useful for instantiating bias parameters.



def random_weight(shape):
    """
    Create random Tensors for weights; setting requires_grad=True means that we
    want to compute gradients for these Tensors during the backward pass.
    We use Kaiming normalization: sqrt(2 / fan_in)
    """
    if len(shape) == 2:  # FC weight
        fan_in = shape[0]
    else:
        # conv weight [out_channel, in_channel, kH, kW]
        fan_in = np.prod(shape[1:])
    # randn is standard normal distribution generator.
    w = torch.randn(shape, device=device, dtype=dtype) * np.sqrt(2. / fan_in)
    w.requires_grad = True
    return w


def zero_weight(shape):
    return torch.zeros(shape, device=device, dtype=dtype, requires_grad=True)


# create a weight of shape [3 x 5]
# you should see the type `torch.cuda.FloatTensor` if you use GPU.
# Otherwise it should be `torch.FloatTensor`
random_weight((3, 5))

Barebones PyTorch: Check Accuracy

在这部分不需要计算grad，所以要关上torch.no_grad()避免浪费
输入
- 一个DataLoader来给我们想要check的data分块
- 一个表示模型到底是什么样子的model_fn，来计算预测的scores
- 这个model需要的参数
没有返回值但是会print出来acc



def check_accuracy_part2(loader, model_fn, params):
    """
    Check the accuracy of a classification model.

    Inputs:
    - loader: A DataLoader for the data split we want to check
    - model_fn: A function that performs the forward pass of the model,
      with the signature scores = model_fn(x, params)
    - params: List of PyTorch Tensors giving parameters of the model

    Returns: Nothing, but prints the accuracy of the model
    """
    split = 'val' if loader.dataset.train else 'test'
    print('Checking accuracy on the %s set' % split)
    num_correct, num_samples = 0, 0
    with torch.no_grad():
        for x, y in loader:
            x = x.to(device=device, dtype=dtype)  # move to device, e.g. GPU
            y = y.to(device=device, dtype=torch.int64)
            scores = model_fn(x, params)
            _, preds = scores.max(1)
            num_correct += (preds == y).sum()
            num_samples += preds.size(0)
        acc = float(num_correct) / num_samples
        print('Got %d / %d correct (%.2f%%)' %
              (num_correct, num_samples, 100 * acc))

BareBones PyTorch: Training Loop

用stochastic gradient descent without momentum来train，并且用torch.functional.cross_entropy来计算loss
输入
- model_fc
- params
- learning_rate
没有输出
进行的操作
- 把data移动到GPU或者CPU
- 计算score和loss
- loss.backward()
- update params，这部分不需要计算grad

BareBones PyTorch: Training a ConvNet

需要网络
1. Convolutional layer(with bias) with 32 5x5 filters, with zero - padding of 2
2. ReLU
3. Convolutional layer(with bias) with 16 3x3 filters, with zero - padding of 1
4. ReLU
5. Fully - connected layer(with bias) to compute scores for 10 classes
需要自己初始化参数，不需要tune hypers
- 注意1：fc的w的大小是D,C，跟数据无关需要从上一层的输出求
- conv之后的图片大小从32-> 30

learning_rate = 3e-3

channel_1 = 32
channel_2 = 16

conv_w1 = None
conv_b1 = None
conv_w2 = None
conv_b2 = None
fc_w = None
fc_b = None

################################################################################
# TODO: Initialize the parameters of a three-layer ConvNet.                    #
################################################################################
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

conv_w1 = random_weight((channel_1, 3, 5, 5))
conv_b1 = zero_weight(channel_1)
conv_w2 = random_weight((channel_2, channel_1, 5, 5))
conv_b2 = zero_weight(channel_2)
fc_w = random_weight((channel_2 * 30 * 30, 10))
fc_b = zero_weight(10)

# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
################################################################################
#                                 END OF YOUR CODE                             #
################################################################################

params = [conv_w1, conv_b1, conv_w2, conv_b2, fc_w, fc_b]
train_part2(three_layer_convnet, params, learning_rate)

Part3 PyTorch Module API

上面的所有过程是手算来track整个过程的，但是在更大的net里面就没有什么用了
nn.Module来定义网络，并且可以选optmi的方法

Subclass nn.Module. Give your network class an intuitive name like TwoLayerFC.
__init__()里面定义自己需要的所有层. nn.Linear and nn.Conv2d 都在模块里自带了. nn.Module will track these internal parameters for you. Refer to the doc to learn more about the dozens of builtin layers. Warning: don’t forget to call the super().__init__() first!（调用父类）
In the forward() method, define the connectivity of your network. 直接用init里面初始化好的方法来forward，不要再forward里面增加新的方法

用上面的方法来写一个三层的layer

注意需要初始化w和b的参数，用kaiming的方法

class ThreeLayerConvNet(nn.Module):
    def __init__(self, in_channel, channel_1, channel_2, num_classes):
        super().__init__()
        ########################################################################
        # TODO: Set up the layers you need for a three-layer ConvNet with the  #
        # architecture defined above.                                          #
        ########################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        self.conv_1 = nn.Conv2d(in_channel,channel_1,5,stride=1, padding=2,bias=True)
        nn.init.kaiming_normal_(self.conv_1.weight)
        nn.init.constant_(self.conv_1.bias, 0)
        self.conv_2 = nn.Conv2d(channel_1,channel_2,3,stride=1, padding=1,bias=True)
        nn.init.kaiming_normal_(self.conv_2.weight)
        nn.init.constant_(self.conv_2.bias, 0)
        self.fc_3 = nn.Linear(channel_2 * 32 * 32 , num_classes)
        nn.init.kaiming_normal_(self.fc_3.weight)
        nn.init.constant_(self.fc_3.bias, 0)

        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ########################################################################
        #                          END OF YOUR CODE                            #       
        ########################################################################

    def forward(self, x):
        scores = None
        ########################################################################
        # TODO: Implement the forward function for a 3-layer ConvNet. you      #
        # should use the layers you defined in __init__ and specify the        #
        # connectivity of those layers in forward()                            #
        ########################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        x = self.conv_1(x)
        x = self.conv_2(F.relu(x))
        x = flatten(F.relu(x))
        x = self.fc_3(x)
        scores = x

        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ########################################################################
        #                             END OF YOUR CODE                         #
        ########################################################################
        return scores


def test_ThreeLayerConvNet():
    x = torch.zeros((64, 3, 32, 32), dtype=dtype)  # minibatch size 64, image size [3, 32, 32]
    model = ThreeLayerConvNet(in_channel=3, channel_1=12, channel_2=8, num_classes=10)
    scores = model(x)
    print(scores.size())  # you should see [64, 10]
test_ThreeLayerConvNet()

Module API: Check Accuracy

不用手动pass参数了，直接就可以得到整个net的acc

Module API: Training Loop

用optimizer这个object来update weights
输入
- model
- optimizer
- epoch，可选
没有return，但是会打印出来training时候的acc

其实就是设置好model和optimizer就可以了

Part4 PyTorch Sequential API

nn.Sequential没有上面的灵活，但是可以集成上面的一串功能
需要提前定义一个在forward里面能用的flatten

# We need to wrap `flatten` function in a module in order to stack it
# in nn.Sequential
class Flatten(nn.Module):
    def forward(self, x):
        return flatten(x)

hidden_layer_size = 4000
learning_rate = 1e-2

model = nn.Sequential(
    Flatten(),
    nn.Linear(3 * 32 * 32, hidden_layer_size),
    nn.ReLU(),
    nn.Linear(hidden_layer_size, 10),
)

# you can use Nesterov momentum in optim.SGD
optimizer = optim.SGD(model.parameters(), lr=learning_rate,
                     momentum=0.9, nesterov=True)

train_part34(model, optimizer)

实现三层，注意需要初始化参数

这里遇到了一个问题是当用random_weight实现的时候，acc会特别低
从这里发现可以重新定义另一个计算方法不同的weights
从这里得知如何给module增加新的function

def xavier_normal(shape):
    """
    Create random Tensors for weights; setting requires_grad=True means that we
    want to compute gradients for these Tensors during the backward pass.
    We use Xavier normalization: sqrt(2 / (fan_in + fan_out))
    """
    if len(shape) == 2:  # FC weight
        fan_in = shape[1]
        fan_out = shape[0]
    else:
        fan_in = np.prod(shape[1:]) # conv weight [out_channel, in_channel, kH, kW]
        fan_out = shape[0] * shape[2] * shape[3]
    # randn is standard normal distribution generator. 
    w = torch.randn(shape, device=device, dtype=dtype) * np.sqrt(2. / (fan_in + fan_out))
    w.requires_grad = True
    return w

channel_1 = 32
channel_2 = 16
learning_rate = 1e-2

model = None
optimizer = None

################################################################################
# TODO: Rewrite the 2-layer ConvNet with bias from Part III with the           #
# Sequential API.                                                              #
################################################################################
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****


model = nn.Sequential(
    nn.Conv2d(3, channel_1,5,stride = 1,padding = 2),
    nn.ReLU(),
    nn.Conv2d(channel_1, channel_2,3,stride = 1,padding = 1),
    nn.ReLU(),
    Flatten(),
    nn.Linear(32*32*channel_2, 10),
)

def init_weights(m):
    print(m)
    if type(m) == nn.Linear:
        m.weight.data = xavier_normal(m.weight.size())
        m.bias.data = zero_weight(m.bias.size())

model.apply(init_weights)

optimizer = optim.SGD(model.parameters(), lr=learning_rate,
                     momentum=0.9, nesterov=True)

# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
################################################################################
#                                 END OF YOUR CODE                             
################################################################################

train_part34(model, optimizer)

Part5 来训练CIFAR-10吧！

自己找net的结构，hyper，loss，optimizers来把CIFAR-10的val_acc在10个epoch之内升到70%以上！

Layers in torch.nn package: http://pytorch.org/docs/stable/nn.html
Activations: http://pytorch.org/docs/stable/nn.html#non-linear-activations
Loss functions: http://pytorch.org/docs/stable/nn.html#loss-functions
Optimizers: http://pytorch.org/docs/stable/optim.html

一些可能的方法：

Filter size: Above we used 5x5; would smaller filters be more efficient?
Number of filters: Above we used 32 filters. Do more or fewer do better?
Pooling vs Strided Convolution: Do you use max pooling or just stride convolutions?
Batch normalization: Try adding spatial batch normalization after convolution layers and vanilla batch normalization after affine layers. Do your networks train faster?
Network architecture: The network above has two layers of trainable parameters. Can you do better with a deep network? Good architectures to try include:
- [conv-relu-pool]xN -> [affine]xM -> [softmax or SVM]
- [conv-relu-conv-relu-pool]xN -> [affine]xM -> [softmax or SVM]
- [batchnorm-relu-conv]xN -> [affine]xM -> [softmax or SVM]
Global Average Pooling: Instead of flattening and then having multiple affine layers, perform convolutions until your image gets small (7x7 or so) and then perform an average pooling operation to get to a 1x1 image picture (1, 1 , Filter#), which is then reshaped into a (Filter#) vector. This is used in Google’s Inception Network (See Table 1 for their architecture).
Regularization: Add l2 weight regularization, or perhaps use Dropout.

一些tips：

应该会在几百个iter里面就看到进步，如果params work well
tune hyper的时候从一大片range和小的train开始，找到好一些的之后再围绕这个范围找（多训一点）
在找hyper的时候应该用val set

model = None
optimizer = None

# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

channel_1 = 16
channel_2 = 32
channel_3 = 64
channel_4 = 64

fc_1 = 1024
num_classes = 10


model = nn.Sequential(
    nn.Conv2d(3, channel_1,3,stride = 1,padding = 1),
    nn.BatchNorm2d(channel_1),
    nn.ReLU(),
    nn.MaxPool2d(kernel_size = 2),
    nn.Conv2d(channel_1, channel_2,3,stride = 1,padding = 1),
    nn.BatchNorm2d(channel_2),
    nn.ReLU(),
    nn.MaxPool2d(kernel_size = 2),
    nn.Conv2d(channel_2, channel_3,3,stride = 1,padding = 1),
    nn.BatchNorm2d(channel_3),
    nn.ReLU(),
    nn.Conv2d(channel_3, channel_4,3,stride = 1,padding = 1),
    nn.BatchNorm2d(channel_4),
    nn.ReLU(),
    nn.MaxPool2d(kernel_size = 2),
    Flatten(),
    nn.Linear(4*4*channel_4, num_classes)
#     nn.Linear(fc_1,num_classes)
    )


learning_rate = 1e-3
optimizer = optim.Adam(model.parameters(), lr=learning_rate,
                     betas=(0.9, 0.999), eps=1e-08, weight_decay=0)

# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
################################################################################
#                                 END OF YOUR CODE                             
################################################################################

# You should get at least 70% accuracy
train_part34(model, optimizer, epochs=10)

第四层conv试过ksize=1，效果不是很好
BN好像效果很好
maxpool多一些，计算负担少而且效果好像比较好
最终val_acc在77-79左右，test_acc = 76.22

关于python生成动态变量名

Posted on 2019-05-07 | In 编程语言 , Python

动态生成变量名

如果想要生成一系列的a0，a1，….a20这种变量名，直接手写太麻烦了

locals

local()，以字典的类型返回当前位置的全部局部变量

arrange_list = locals()

for i in range(10):
	arrange_list['list_' + str(i)] = []

调用动态变量，可以用字典的get方法得到变量的值

arrange_list = locals()

for i in range(10):
	print(arrange_list.get('var'+str(i)), end = " ")

利用exec进行赋值

1 2	for i in range(5): exec('var{} = {}'.format(i, i))

调用动态变量

1 2	for i in range(5): exec('print(var{}, end = " ")'.format(i))

关于多维数组的转置和增加新的维度

Posted on 2019-04-25 | Edited on 2019-04-26 | In 编程语言 , Python , Numpy

在二维转置的时候，a[i][j] = a[j][i]
在多维数组转置的时候，需要交换他们的下标
比如原来的数组是(X,Y,Z)，转置之后是(Z,X,Y)
这时候应该用的是np.transpose(A,(2,0,1))

np.newaxis -> 增加新的维度
原来是（6，）的数组，在行上增加维度变成（1,6）的二维数组，在列上增加维度变为(6,1)的二维数组

学习OpenCV十八章_Camera models & calibration

Posted on 2019-04-22 | Edited on 2019-04-25 | In 图像处理 , OpenCV , Calibration

camera models & calibration

物体会吸收一部分的光，然后反射一部分的光，反射的光就是他自己的颜色，这个光被我们的眼睛（或者相机）接收，然后投影到我们的视网膜（或者相机的图片）上，这之间的几何关系在CV上面非常重要

其中一个非常简单的模型就是pinhole camera model。光穿过一面墙上的一个小的aperture，这个是这章的模型的开始，但是真实pinhole模型不是很好因为他不能快速曝光（聚集的光不够）-> 眼睛会更厉害一点，但是len还会distort图片。

这章的目的：

如何camera calibration
纠正普通的pinhole模型的len的偏差
calibration也同样是获取三维世界的主要方式，因为一个场景不仅仅是三维，他们还有物理的空间和体积，所以获取pixel和三维诗句坐标的关系也很重要
18章纠正的是len的distortion，19章构建整个3D的结构

homography transform -> 一个非常重要的要素

CS231nassignment2CNN

Posted on 2019-04-18 | Edited on 2019-04-25 | In 图像处理 , Deep Learning , CS231n作业

target

之前已经实践了fc的相关东西，但是在实际的使用里大家使用的都是CNN
所以这部分就开始实践CNN了

CS231nassignment2之Dropout

Posted on 2019-04-18 | In 图像处理 , Deep Learning , CS231n作业

Target

regularization NN
randomly setting some features to 0 during forward pass

Geoffrey E. Hinton et al, “Improving neural networks by preventing co-adaptation of feature detectors”, arXiv 2012

OpenGL笔记

Posted on 2019-04-16 | Edited on 2019-05-31 | In OpenGl

Learn OpenGl

on Modern OpenGL. -> 从graphics的programming开始讲的

Getting Start

OPENGL

是一个进行图像处理的工具
可以被认为是API，但是实际上是specification
- 明确说明了每个function应该的输入和输出，以及如何perform
- 用户在用这个说明来解决问题，因为没有给出明确的implement的过程，所以只要结果符合规则，怎么implement都可以

Core-profile vs Immediate mode

以前的版本用的是immediate mode
- 比较好用来画图
- 具体的是实现都在lib里面，developer不是很好的能看到如何计算
- 效率越来越低
Core-profile
- 在3.2版本之后改成了这个
- 强制使用modern practices，如果想要用被分出去的function就会直接报错
- 效率高，更灵活，更难学

extensions

支持extensions，只要检查支不支持graphic card就可以知道能不能用
可以直接用比较新的东西，不用等着OPENGL更新新的功能
需要在用之前判断他是不是available的，如果不是需要用原来的方法搞

State Machine

OpenGL自己就是一个State Machine：一个var的集合，来判断他现在应该如何操作
state -> context
- 改变state：设定一些options，操作一些buffer，在现在的context来render
例子：
- 如果我想画三角形，而不是画线了，就改变draw的state
- 只要这个改变传达到了，下一条线就画的是三角形了
state-changing用来改变context，state-using在现在的state上面开始进行操作

Objects

一个集合来表现OpenGL的subset的state
- 比如可以用一个object来表示对window的设定，可以设置大小，设置支持的颜色等等
  1
  2
  3
  4
  5
  6
  // The State of OpenGL
  struct OpenGL_Context {
  ...
  object_name* object_Window_Target;
  ...
  };

// create object
unsigned int objectId = 0;
glGenObject(1, &objectId);
// bind object to context
glBindObject(GL_WINDOW_TARGET, objectId);
// set options of object currently bound to GL_WINDOW_TARGET
glSetObjectOption(GL_WINDOW_TARGET, GL_OPTION_WINDOW_WIDTH, 800);
glSetObjectOption(GL_WINDOW_TARGET, GL_OPTION_WINDOW_HEIGHT, 600);
// set context target back to default
glBindObject(GL_WINDOW_TARGET, 0);

流程
- 首先创建了一个object，里面存了一个ref是这个object的id
- 然后把这个object和context的目标位置bind在了一起
- 设置了这些window的参数
- 最后un-bind这两个东西，把window target改回原来的值
这样的话我们可以创建很多object，提前设置好里面的量，等到需要用的时候就直接bind就可以用了
- 比如我们有一堆object包含了小人，小马，小鹿
- 想画哪个就把哪个绑定到draw里面，就可以直接画出来了

Crateing a window

因为操作系统的问题，所有操作系统上面不是很一样。但是已经有一些提供这些功能的函数了，这里用的是GLFW

GLFW

一个lib，用C写的，主要目的是提供把东西渲染到屏幕的功能
可以创建一个context，定义窗口的params，处理用户的输入

已经一口气配置好了这些！
https://www.jianshu.com/p/25d5fbf792a2
记得在link lib里面把openGL的framework加进去！！！！！

GLAD

因为openGL还需要不同版本的driver的支持，需要有东西来处理这部分的内容
和其他的东西不同，GLAD用的是web service
- 在这个网页上选择好语言，版本号，确保profile是core，然后生成
- 直接下载下来对应的zip，然后把include放进include里面，.c文件放在project里面
莫名其妙并不需要这一步，神奇，可能是我在include里面已经搞进来了！！

Hello Window

初始化

int main()
{
    glfwInit();
    glfwWindowHint(GLFW_CONTEXT_VERSION_MAJOR, 3);
    glfwWindowHint(GLFW_CONTEXT_VERSION_MINOR, 3);
    glfwWindowHint(GLFW_OPENGL_PROFILE, GLFW_OPENGL_CORE_PROFILE);
    //glfwWindowHint(GLFW_OPENGL_FORWARD_COMPAT, GL_TRUE);
  
    return 0;
}

首先进行了初始化
然后configure了GLFW，设置了a large enum of possible options prefixed with GLFW_. （第三行就是最小） -> 大概是设置要用GLFW的版本号
然后也告诉了他想用core

然后需要使用glfwCreateWindow这个函数，来创建这个GLFWwindow* window的变量
- 创建的函数需要窗口的长宽
- 窗口名

创建完之后就可以把这个窗口设置成glfwMakeContextCurrent(window);也就是说设置成了现在的thread里面

GLFWwindow* window = glfwCreateWindow(800, 600, "LearnOpenGL", NULL, NULL);
if (window == NULL)
{
    std::cout << "Failed to create GLFW window" << std::endl;
    glfwTerminate();
    return -1;
}
glfwMakeContextCurrent(window);

GLAD

GLAD是为OpenGL来管理这些函数的，在使用这些函数之前需要初始化GLAD

if (!gladLoadGLLoader((GLADloadproc)glfwGetProcAddress))
{
    std::cout << "Failed to initialize GLAD" << std::endl;
    return -1;
}

viewpoint

在开始render之前我们还需要告诉GL渲染窗口的大小，用到了glViewport这个函数
- 前面两个参数定义了这个窗口左下角的坐标
- 后面两个参数定义了需要render的窗口的大小
每次调整window的大小的时候viewport也需要被调整

engines

我们希望这个engine可以一直持续画图，直到最后我们告诉这个窗口要关闭，所以要建立一个循环
1
2
3
4
5
while(!glfwWindowShouldClose(window))
{
glfwSwapBuffers(window);
glfwPollEvents();
}
在这个循环里面，pollevent是来检查是不是有trigger进来的事情（比如键盘输入），更新窗口的状态，并且call相应的函数
swapbuffer，会交换color buffer（包括每个像素点颜色的buffer），然后show在窗口里面

last thing

glfwTerminate退出这个循环之后，需要清除这些相关的资源，用这个函数放在最底下

input

需要一些键盘上的操作来调整的时候，写了一个processInput的函数
比如下面这个函数就是检测了有没有按下去esc，如果按了的话就关闭窗口

写完之后把这个函数在while循环里面调用

void processInput(GLFWwindow *window)
{
    if(glfwGetKey(window, GLFW_KEY_ESCAPE) == GLFW_PRESS)
        glfwSetWindowShouldClose(window, true);
}

rendering

希望在一个loop里面放上去所有的rendering的命令，整个循环看起来应该是这个样子的

// render loop
while(!glfwWindowShouldClose(window))
{
    // input
    processInput(window);

    // rendering commands here
    ...

    // check and call events and swap the buffers
    glfwPollEvents();
    glfwSwapBuffers(window);
}

hello triangle

opengl里面所有东西都是在3D的空间里的，但是屏幕上显示的东西是2D的。整个这个转换的过程叫做graphics pipeline，可以分成两个步骤：第一个是把物体的3D坐标转化成2D的坐标，第二个是把2D的坐标转化成pixel上面的具体值

pipeline

所有的转化步骤都可以parallel的进行，现在的显卡有很多小的core来进行 -> shaders
在最开始的时候pass进去了一个list的3D坐标（Vertex Data）
第一步：vertex shader
- 把3D的坐标转化成不同的3D坐标（相当于把数据转化成点？）
primitive assembly
- 从上一步得到的左右的点得到输入
- 然后形成一个基本的图形
geometry shader
- 根据新给的点，形成新的不同的形状，比如在例子里面形成了新的一条线
rasterization stage
- 把上面得到的primitives map到最后的屏幕上面的相应的pixel上面
Clipping
- 这一步丢掉了所有在视线外面的fragments，提升性能
fragment shader
- 计算这个pixel最后的颜色，会在这一步计算光影，以及光线的颜色等等东西
当每个像素的颜色决定了以后，这个object会被送到alpha test和blending
- 这一步会测试深度原因，判断fragment是在物体的前面还是后面
- 还会考虑透明度的问题
  虽然上面的东西很复杂，但是在实际应用的时候只需要要考虑vertex和fragment shader

vertex input

openGL是3D的东西，所有的点设置input的时候都需要设置三维的坐标 xyz
只有在坐标在 -1 到 1 中间的时候，才会处理这些坐标，这个范围里面的数字是根据屏幕的比例得出来的normalized device coordinates
比如在这个例子里面，需要的渲染一个三角形，那么需要这个三角形的三个点的坐标。注意这个例子里面根本没有考虑深度，而是直接画在了平面上面
1
2
3
4
5
float vertices[] = {
-0.5f, -0.5f, 0.0f,
0.5f, -0.5f, 0.0f,
0.0f, 0.5f, 0.0f
};
在定义好坐标之后需要把这个东西放进vertex shader里面，需要在GPU里面创建一部分内存来存储这个数据，并且需要在GPU里面存储大量的数据（这样不用每次都送了）
每个部分的object都会有一个自己的buffer id，可以通过下面的方法生成一个id。也可以把一串array绑到这个id上面
1
2
unsigned int VBO;
glGenBuffers(1, &VBO);

CS231Nassignment2之Batch Normalization

Posted on 2019-04-15 | Edited on 2019-04-18 | In 图像处理 , Deep Learning , CS231n作业

target

之前的内容讲了lr的优化方法，比如Adam，另一种方法是根据改变网络的结构，make it easy to train -> batch normalization
想去掉一些uncorrelated features(不相关的特征)，可以在训练数据之前preprocess，变成0-centered分布，这样第一层是没有问题的，但是后面的层里还是会出问题
所以把normalization的部分加入了DN里面，加入了一个BN层，会估计mean和standard deviation of each feature，这样重新centre和normalized
learnable shift and scale parameters for each feature dimension
核心思想：粗暴的用BN来解决weights初始化的问题

ref：https://kratzert.github.io/2016/02/12/understanding-the-gradient-flow-through-the-batch-normalization-layer.html