今天开始努力学习

面向谷歌编程选手许若芃


  • Home

  • Categories58

  • Archives85

  • Search

CS231Nassignment2之Pytorch

Posted on 2019-05-07 | In 图像处理 , Deep Learning , CS231n作业

这部分需要在torch和TensorFlow两个framework里面选一个。

PyTorch

What

  • 加入了Tensor的object(类似于narray),不需要手动的backprop了

Why

  • 在GPU上面跑,不需要CUDA就可以在自己的GPU上面跑NN
  • functions很多
  • 站在巨人的肩膀上!
  • 在实际使用中应该写的深度学习代码

学习资料

  • Justin Johnson has made an excellenttutorial for PyTorch.
  • DetailedAPI doc
  • If you have other questions that are not addressed by the API docs, the PyTorch forum is a much better place to ask than StackOverflow.

整体结构

  • 第一部分,准备,使用dataset
  • 第二部分,abstraction level1,直接在最底层的Tensors上面操作
  • 第三部分,abstraction level2,nn.Module定义一个任意的NN结构
  • 第四部分,abstraction level3,nn.Sequential,定义一个简单的线性feed - back网络
  • 第五部分,自己调参,尽量让CIFAR - 10的精度尽可能高

Part 1.Preparation

pytorch里面有下载dataset,预处理并且迭代成minibatch的功能

import torchvision.transforms as T

  • 这个包包括了预处理以及增强data的功能,在这里选择了减去平均的RGB并且除以标准差
  • 然后对不同的部分分别构建了一个dataset object(训练,测试,val),这个dataset会载入一次training example,并且在DataLoader部分构建minibatch
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
NUM_TRAIN = 49000

# The torchvision.transforms package provides tools for preprocessing data
# and for performing data augmentation; here we set up a transform to
# preprocess the data by subtracting the mean RGB value and dividing by the
# standard deviation of each RGB value; we've hardcoded the mean and std.
transform = T.Compose([
T.ToTensor(),
T.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
])

# We set up a Dataset object for each split (train / val / test); Datasets load
# training examples one at a time, so we wrap each Dataset in a DataLoader which
# iterates through the Dataset and forms minibatches. We divide the CIFAR-10
# training set into train and val sets by passing a Sampler object to the
# DataLoader telling how it should sample from the underlying Dataset.
cifar10_train = dset.CIFAR10('./cs231n/datasets', train=True, download=True,
transform=transform)
loader_train = DataLoader(cifar10_train, batch_size=64,
sampler=sampler.SubsetRandomSampler(range(NUM_TRAIN)))

cifar10_val = dset.CIFAR10('./cs231n/datasets', train=True, download=True,
transform=transform)
loader_val = DataLoader(cifar10_val, batch_size=64,
sampler=sampler.SubsetRandomSampler(range(NUM_TRAIN, 50000)))

cifar10_test = dset.CIFAR10('./cs231n/datasets', train=False, download=True,
transform=transform)
loader_test = DataLoader(cifar10_test, batch_size=64)
  • 需要一个是否使用GPU的flag,并且set到true。在这个作业里面不是必须用GPU跑,但是如果电脑不能enableCUDA的话,就会自动返回CPU模式。
  • 除此之外,建立了两个global var,dtype代表float32,device代表用哪个
  • 因为mac本身不支持CUDA,而且好像新版本的系统还不能安装N卡的部分,所以现在用的CPU
1
2
3
4
5
6
7
8
9
10
11
12
13
USE_GPU = True

dtype = torch.float32 # we will be using float throughout this tutorial

if USE_GPU and torch.cuda.is_available():
device = torch.device('cuda')
else:
device = torch.device('cpu')

# Constant to control how frequently we print train loss
print_every = 100

print('using device:', device)

Part2 Barebones PyTorch

  • 虽然有很多高层的API已经有了很多功能,但是这部分从比较底层的部分来进行
  • 建立一个简单的fc - relu net,两个中间层,没有bias
  • 用Tensor的method来计算forward,并且用自带的autograd来计算back
  • 如果设定了requires_grad = True,那么在计算的时候不仅会计算值,还会生成计算back的graph
  • if x is a Tensor with x.requires_grad == True then after backpropagation x.grad will be another Tensor holding the gradient of x with respect to the scalar loss at the end

PyTorch Tensors: Flatten Function

  • Tensors是一个和narray很像的东西,定义了很多比较好用的功能,比如flatten来reshape image data
  • 在Tensor里面一个图片的形状是NxCxHxW
    • datapoint的数量
    • channels
    • feature map的H和W
  • 但是在affine里面我们希望一个datapoint可以表现成一个单独的vector,而不是channel和宽和高
  • 所以在这里用flatten来首先读取NCHW的数据,然后返回这个data的view(相当于array里面的reshape,把它改成了Nx??,其中??可以是任何值)
1
2
3
4
5
6


def flatten(x):
N = x.shape[0] # read in N, C, H, W
# "flatten" the C * H * W values into a single vector per image
return x.view(N, -1)

Barebones PyTorch: Two-Layer Network

当定义一个 two_layer_fc的时候,会有两层的中间带relu的forward,在写好了forward之后需要确保输出的形状是对的并且没有什么问题(最近好像对这个大小已经没有什么疑问了)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
import torch.nn.functional as F  # useful stateless functions


def two_layer_fc(x, params):
"""
A fully-connected neural networks; the architecture is:
NN is fully connected -> ReLU -> fully connected layer.
Note that this function only defines the forward pass;
PyTorch will take care of the backward pass for us.

The input to the network will be a minibatch of data, of shape
(N, d1, ..., dM) where d1 * ... * dM = D. The hidden layer will have H units,
and the output layer will produce scores for C classes.

Inputs:
- x: A PyTorch Tensor of shape (N, d1, ..., dM) giving a minibatch of
input data.
- params: A list [w1, w2] of PyTorch Tensors giving weights for the network;
w1 has shape (D, H) and w2 has shape (H, C).

Returns:
- scores: A PyTorch Tensor of shape (N, C) giving classification scores for
the input data x.
"""
# first we flatten the image
x = flatten(x) # shape: [batch_size, C x H x W]

w1, w2 = params

# Forward pass: compute predicted y using operations on Tensors. Since w1 and
# w2 have requires_grad=True, operations involving these Tensors will cause
# PyTorch to build a computational graph, allowing automatic computation of
# gradients. Since we are no longer implementing the backward pass by hand we
# don't need to keep references to intermediate values.
# you can also use `.clamp(min=0)`, equivalent to F.relu()
x = F.relu(x.mm(w1))
x = x.mm(w2)
return x


def two_layer_fc_test():
hidden_layer_size = 42
# minibatch size 64, feature dimension 50
x = torch.zeros((64, 50), dtype=dtype)
w1 = torch.zeros((50, hidden_layer_size), dtype=dtype)
w2 = torch.zeros((hidden_layer_size, 10), dtype=dtype)
scores = two_layer_fc(x, [w1, w2])
print(scores.size()) # you should see [64, 10]


two_layer_fc_test()

Barebones PyTorch: Three-Layer ConvNet

  • 上下这两个都是,在测试的时候可以直接pass 0来测试tensor的大小是不是对的
  • 网络的结构
    • conv with bias,channel_1 filters,KW1xKH1,2 zero - padding
    • RELU
    • conv with bias,channel_2 filters,KW2xKH2,1 zero - padding
    • RELU
    • fc with bias,输出C class
  • 注意!在这里fc之后没有softmax的激活层,因为在后面计算loss的时候会提供softmax,计算起来更加有效率
  • 注意2!在conv2d之前不需要flatten,在fc之前才需要flatten
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45


def three_layer_convnet(x, params):
"""
Performs the forward pass of a three-layer convolutional network with the
architecture defined above.

Inputs:
- x: A PyTorch Tensor of shape (N, 3, H, W) giving a minibatch of images
- params: A list of PyTorch Tensors giving the weights and biases for the
network; should contain the following:
- conv_w1: PyTorch Tensor of shape (channel_1, 3, KH1, KW1) giving weights
for the first convolutional layer
- conv_b1: PyTorch Tensor of shape (channel_1,) giving biases for the first
convolutional layer
- conv_w2: PyTorch Tensor of shape (channel_2, channel_1, KH2, KW2) giving
weights for the second convolutional layer
- conv_b2: PyTorch Tensor of shape (channel_2,) giving biases for the second
convolutional layer
- fc_w: PyTorch Tensor giving weights for the fully-connected layer. Can you
figure out what the shape should be? (N,channel_2*H*W)
- fc_b: PyTorch Tensor giving biases for the fully-connected layer. Can you
figure out what the shape should be? (C,)

Returns:
- scores: PyTorch Tensor of shape (N, C) giving classification scores for x
"""
conv_w1, conv_b1, conv_w2, conv_b2, fc_w, fc_b = params
scores = None
################################################################################
# TODO: Implement the forward pass for the three-layer ConvNet. #
################################################################################
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

x = nn.functional.conv2d(x, conv_w1, bias=conv_b1, padding=2)
x = nn.functional.conv2d(F.relu(x), conv_w2, bias=conv_b2, padding=1)
x = flatten(x)
x = x.mm(fc_w) + fc_b
scores = x

# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
################################################################################
# END OF YOUR CODE #
################################################################################
return scores

Barebones PyTorch: Initialization

  • random_weight(shape) initializes a weight tensor with the Kaiming normalization method. -> 使用了KAIMING normal
  • zero_weight(shape) initializes a weight tensor with all zeros. Useful for instantiating bias parameters.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27


def random_weight(shape):
"""
Create random Tensors for weights; setting requires_grad=True means that we
want to compute gradients for these Tensors during the backward pass.
We use Kaiming normalization: sqrt(2 / fan_in)
"""
if len(shape) == 2: # FC weight
fan_in = shape[0]
else:
# conv weight [out_channel, in_channel, kH, kW]
fan_in = np.prod(shape[1:])
# randn is standard normal distribution generator.
w = torch.randn(shape, device=device, dtype=dtype) * np.sqrt(2. / fan_in)
w.requires_grad = True
return w


def zero_weight(shape):
return torch.zeros(shape, device=device, dtype=dtype, requires_grad=True)


# create a weight of shape [3 x 5]
# you should see the type `torch.cuda.FloatTensor` if you use GPU.
# Otherwise it should be `torch.FloatTensor`
random_weight((3, 5))

Barebones PyTorch: Check Accuracy

  • 在这部分不需要计算grad,所以要关上torch.no_grad()避免浪费
  • 输入
    • 一个DataLoader来给我们想要check的data分块
    • 一个表示模型到底是什么样子的model_fn,来计算预测的scores
    • 这个model需要的参数
  • 没有返回值但是会print出来acc
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28


def check_accuracy_part2(loader, model_fn, params):
"""
Check the accuracy of a classification model.

Inputs:
- loader: A DataLoader for the data split we want to check
- model_fn: A function that performs the forward pass of the model,
with the signature scores = model_fn(x, params)
- params: List of PyTorch Tensors giving parameters of the model

Returns: Nothing, but prints the accuracy of the model
"""
split = 'val' if loader.dataset.train else 'test'
print('Checking accuracy on the %s set' % split)
num_correct, num_samples = 0, 0
with torch.no_grad():
for x, y in loader:
x = x.to(device=device, dtype=dtype) # move to device, e.g. GPU
y = y.to(device=device, dtype=torch.int64)
scores = model_fn(x, params)
_, preds = scores.max(1)
num_correct += (preds == y).sum()
num_samples += preds.size(0)
acc = float(num_correct) / num_samples
print('Got %d / %d correct (%.2f%%)' %
(num_correct, num_samples, 100 * acc))

BareBones PyTorch: Training Loop

  • 用stochastic gradient descent without momentum来train,并且用torch.functional.cross_entropy来计算loss
  • 输入
    • model_fc
    • params
    • learning_rate
  • 没有输出
  • 进行的操作
    • 把data移动到GPU或者CPU
    • 计算score和loss
    • loss.backward()
    • update params,这部分不需要计算grad

BareBones PyTorch: Training a ConvNet

  • 需要网络
    1. Convolutional layer(with bias) with 32 5x5 filters, with zero - padding of 2
    2. ReLU
    3. Convolutional layer(with bias) with 16 3x3 filters, with zero - padding of 1
    4. ReLU
    5. Fully - connected layer(with bias) to compute scores for 10 classes
  • 需要自己初始化参数,不需要tune hypers
    • 注意1:fc的w的大小是D,C,跟数据无关需要从上一层的输出求
    • conv之后的图片大小从32-> 30
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
learning_rate = 3e-3

channel_1 = 32
channel_2 = 16

conv_w1 = None
conv_b1 = None
conv_w2 = None
conv_b2 = None
fc_w = None
fc_b = None

################################################################################
# TODO: Initialize the parameters of a three-layer ConvNet. #
################################################################################
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

conv_w1 = random_weight((channel_1, 3, 5, 5))
conv_b1 = zero_weight(channel_1)
conv_w2 = random_weight((channel_2, channel_1, 5, 5))
conv_b2 = zero_weight(channel_2)
fc_w = random_weight((channel_2 * 30 * 30, 10))
fc_b = zero_weight(10)

# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
################################################################################
# END OF YOUR CODE #
################################################################################

params = [conv_w1, conv_b1, conv_w2, conv_b2, fc_w, fc_b]
train_part2(three_layer_convnet, params, learning_rate)

Part3 PyTorch Module API

  • 上面的所有过程是手算来track整个过程的,但是在更大的net里面就没有什么用了
  • nn.Module来定义网络,并且可以选optmi的方法
  1. Subclass nn.Module. Give your network class an intuitive name like TwoLayerFC.

  2. __init__()里面定义自己需要的所有层. nn.Linear and nn.Conv2d 都在模块里自带了. nn.Module will track these internal parameters for you. Refer to the doc to learn more about the dozens of builtin layers. Warning: don’t forget to call the super().__init__() first!(调用父类)

  3. In the forward() method, define the connectivity of your network. 直接用init里面初始化好的方法来forward,不要再forward里面增加新的方法

用上面的方法来写一个三层的layer

  • 注意需要初始化w和b的参数,用kaiming的方法
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
class ThreeLayerConvNet(nn.Module):
def __init__(self, in_channel, channel_1, channel_2, num_classes):
super().__init__()
########################################################################
# TODO: Set up the layers you need for a three-layer ConvNet with the #
# architecture defined above. #
########################################################################
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

self.conv_1 = nn.Conv2d(in_channel,channel_1,5,stride=1, padding=2,bias=True)
nn.init.kaiming_normal_(self.conv_1.weight)
nn.init.constant_(self.conv_1.bias, 0)
self.conv_2 = nn.Conv2d(channel_1,channel_2,3,stride=1, padding=1,bias=True)
nn.init.kaiming_normal_(self.conv_2.weight)
nn.init.constant_(self.conv_2.bias, 0)
self.fc_3 = nn.Linear(channel_2 * 32 * 32 , num_classes)
nn.init.kaiming_normal_(self.fc_3.weight)
nn.init.constant_(self.fc_3.bias, 0)

# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
########################################################################
# END OF YOUR CODE #
########################################################################

def forward(self, x):
scores = None
########################################################################
# TODO: Implement the forward function for a 3-layer ConvNet. you #
# should use the layers you defined in __init__ and specify the #
# connectivity of those layers in forward() #
########################################################################
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

x = self.conv_1(x)
x = self.conv_2(F.relu(x))
x = flatten(F.relu(x))
x = self.fc_3(x)
scores = x

# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
########################################################################
# END OF YOUR CODE #
########################################################################
return scores


def test_ThreeLayerConvNet():
x = torch.zeros((64, 3, 32, 32), dtype=dtype) # minibatch size 64, image size [3, 32, 32]
model = ThreeLayerConvNet(in_channel=3, channel_1=12, channel_2=8, num_classes=10)
scores = model(x)
print(scores.size()) # you should see [64, 10]
test_ThreeLayerConvNet()

Module API: Check Accuracy

  • 不用手动pass参数了,直接就可以得到整个net的acc

Module API: Training Loop

  • 用optimizer这个object来update weights
  • 输入
    • model
    • optimizer
    • epoch,可选
  • 没有return,但是会打印出来training时候的acc

其实就是设置好model和optimizer就可以了

Part4 PyTorch Sequential API

  • nn.Sequential没有上面的灵活,但是可以集成上面的一串功能
  • 需要提前定义一个在forward里面能用的flatten
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# We need to wrap `flatten` function in a module in order to stack it
# in nn.Sequential
class Flatten(nn.Module):
def forward(self, x):
return flatten(x)

hidden_layer_size = 4000
learning_rate = 1e-2

model = nn.Sequential(
Flatten(),
nn.Linear(3 * 32 * 32, hidden_layer_size),
nn.ReLU(),
nn.Linear(hidden_layer_size, 10),
)

# you can use Nesterov momentum in optim.SGD
optimizer = optim.SGD(model.parameters(), lr=learning_rate,
momentum=0.9, nesterov=True)

train_part34(model, optimizer)

实现三层,注意需要初始化参数

  • 这里遇到了一个问题是当用random_weight实现的时候,acc会特别低
  • 从这里发现可以重新定义另一个计算方法不同的weights
  • 从这里得知如何给module增加新的function
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
def xavier_normal(shape):
"""
Create random Tensors for weights; setting requires_grad=True means that we
want to compute gradients for these Tensors during the backward pass.
We use Xavier normalization: sqrt(2 / (fan_in + fan_out))
"""
if len(shape) == 2: # FC weight
fan_in = shape[1]
fan_out = shape[0]
else:
fan_in = np.prod(shape[1:]) # conv weight [out_channel, in_channel, kH, kW]
fan_out = shape[0] * shape[2] * shape[3]
# randn is standard normal distribution generator.
w = torch.randn(shape, device=device, dtype=dtype) * np.sqrt(2. / (fan_in + fan_out))
w.requires_grad = True
return w

channel_1 = 32
channel_2 = 16
learning_rate = 1e-2

model = None
optimizer = None

################################################################################
# TODO: Rewrite the 2-layer ConvNet with bias from Part III with the #
# Sequential API. #
################################################################################
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****


model = nn.Sequential(
nn.Conv2d(3, channel_1,5,stride = 1,padding = 2),
nn.ReLU(),
nn.Conv2d(channel_1, channel_2,3,stride = 1,padding = 1),
nn.ReLU(),
Flatten(),
nn.Linear(32*32*channel_2, 10),
)

def init_weights(m):
print(m)
if type(m) == nn.Linear:
m.weight.data = xavier_normal(m.weight.size())
m.bias.data = zero_weight(m.bias.size())

model.apply(init_weights)

optimizer = optim.SGD(model.parameters(), lr=learning_rate,
momentum=0.9, nesterov=True)

# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
################################################################################
# END OF YOUR CODE
################################################################################

train_part34(model, optimizer)

Part5 来训练CIFAR-10吧!

自己找net的结构,hyper,loss,optimizers来把CIFAR-10的val_acc在10个epoch之内升到70%以上!

  • Layers in torch.nn package: http://pytorch.org/docs/stable/nn.html
  • Activations: http://pytorch.org/docs/stable/nn.html#non-linear-activations
  • Loss functions: http://pytorch.org/docs/stable/nn.html#loss-functions
  • Optimizers: http://pytorch.org/docs/stable/optim.html

一些可能的方法:

  • Filter size: Above we used 5x5; would smaller filters be more efficient?
  • Number of filters: Above we used 32 filters. Do more or fewer do better?
  • Pooling vs Strided Convolution: Do you use max pooling or just stride convolutions?
  • Batch normalization: Try adding spatial batch normalization after convolution layers and vanilla batch normalization after affine layers. Do your networks train faster?
  • Network architecture: The network above has two layers of trainable parameters. Can you do better with a deep network? Good architectures to try include:
    • [conv-relu-pool]xN -> [affine]xM -> [softmax or SVM]
    • [conv-relu-conv-relu-pool]xN -> [affine]xM -> [softmax or SVM]
    • [batchnorm-relu-conv]xN -> [affine]xM -> [softmax or SVM]
  • Global Average Pooling: Instead of flattening and then having multiple affine layers, perform convolutions until your image gets small (7x7 or so) and then perform an average pooling operation to get to a 1x1 image picture (1, 1 , Filter#), which is then reshaped into a (Filter#) vector. This is used in Google’s Inception Network (See Table 1 for their architecture).
  • Regularization: Add l2 weight regularization, or perhaps use Dropout.

一些tips:

  • 应该会在几百个iter里面就看到进步,如果params work well
  • tune hyper的时候从一大片range和小的train开始,找到好一些的之后再围绕这个范围找(多训一点)
  • 在找hyper的时候应该用val set
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
model = None
optimizer = None

# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

channel_1 = 16
channel_2 = 32
channel_3 = 64
channel_4 = 64

fc_1 = 1024
num_classes = 10


model = nn.Sequential(
nn.Conv2d(3, channel_1,3,stride = 1,padding = 1),
nn.BatchNorm2d(channel_1),
nn.ReLU(),
nn.MaxPool2d(kernel_size = 2),
nn.Conv2d(channel_1, channel_2,3,stride = 1,padding = 1),
nn.BatchNorm2d(channel_2),
nn.ReLU(),
nn.MaxPool2d(kernel_size = 2),
nn.Conv2d(channel_2, channel_3,3,stride = 1,padding = 1),
nn.BatchNorm2d(channel_3),
nn.ReLU(),
nn.Conv2d(channel_3, channel_4,3,stride = 1,padding = 1),
nn.BatchNorm2d(channel_4),
nn.ReLU(),
nn.MaxPool2d(kernel_size = 2),
Flatten(),
nn.Linear(4*4*channel_4, num_classes)
# nn.Linear(fc_1,num_classes)
)


learning_rate = 1e-3
optimizer = optim.Adam(model.parameters(), lr=learning_rate,
betas=(0.9, 0.999), eps=1e-08, weight_decay=0)

# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
################################################################################
# END OF YOUR CODE
################################################################################

# You should get at least 70% accuracy
train_part34(model, optimizer, epochs=10)
  • 第四层conv试过ksize=1,效果不是很好
  • BN好像效果很好
  • maxpool多一些,计算负担少而且效果好像比较好
  • 最终val_acc在77-79左右,test_acc = 76.22

关于python生成动态变量名

Posted on 2019-05-07 | In 编程语言 , Python

动态生成变量名

如果想要生成一系列的a0,a1,….a20这种变量名,直接手写太麻烦了

locals

local(),以字典的类型返回当前位置的全部局部变量

1
2
3
4
arrange_list = locals()

for i in range(10):
arrange_list['list_' + str(i)] = []

调用动态变量,可以用字典的get方法得到变量的值

1
2
3
4
arrange_list = locals()

for i in range(10):
print(arrange_list.get('var'+str(i)), end = " ")

利用exec进行赋值

1
2
for i in range(5):
exec('var{} = {}'.format(i, i))

调用动态变量

1
2
for i in range(5):
exec('print(var{}, end = " ")'.format(i))

关于多维数组的转置和增加新的维度

Posted on 2019-04-25 | Edited on 2019-04-26 | In 编程语言 , Python , Numpy

在二维转置的时候,a[i][j] = a[j][i]
在多维数组转置的时候,需要交换他们的下标
比如原来的数组是(X,Y,Z),转置之后是(Z,X,Y)
这时候应该用的是np.transpose(A,(2,0,1))

np.newaxis -> 增加新的维度
原来是(6,)的数组,在行上增加维度变成(1,6)的二维数组,在列上增加维度变为(6,1)的二维数组

学习OpenCV十八章_Camera models & calibration

Posted on 2019-04-22 | Edited on 2019-04-25 | In 图像处理 , OpenCV , Calibration

camera models & calibration

物体会吸收一部分的光,然后反射一部分的光,反射的光就是他自己的颜色,这个光被我们的眼睛(或者相机)接收,然后投影到我们的视网膜(或者相机的图片)上,这之间的几何关系在CV上面非常重要

其中一个非常简单的模型就是pinhole camera model。光穿过一面墙上的一个小的aperture,这个是这章的模型的开始,但是真实pinhole模型不是很好因为他不能快速曝光(聚集的光不够)-> 眼睛会更厉害一点,但是len还会distort图片。

这章的目的:

  • 如何camera calibration
  • 纠正普通的pinhole模型的len的偏差
  • calibration也同样是获取三维世界的主要方式,因为一个场景不仅仅是三维,他们还有物理的空间和体积,所以获取pixel和三维诗句坐标的关系也很重要
  • 18章纠正的是len的distortion,19章构建整个3D的结构

homography transform -> 一个非常重要的要素

Read more »

CS231nassignment2CNN

Posted on 2019-04-18 | Edited on 2019-04-25 | In 图像处理 , Deep Learning , CS231n作业

target

  • 之前已经实践了fc的相关东西,但是在实际的使用里大家使用的都是CNN
  • 所以这部分就开始实践CNN了
Read more »

CS231nassignment2之Dropout

Posted on 2019-04-18 | In 图像处理 , Deep Learning , CS231n作业

Target

  • regularization NN
  • randomly setting some features to 0 during forward pass

Geoffrey E. Hinton et al, “Improving neural networks by preventing co-adaptation of feature detectors”, arXiv 2012

Read more »

OpenGL笔记

Posted on 2019-04-16 | Edited on 2019-05-31 | In OpenGl

Learn OpenGl

on Modern OpenGL. -> 从graphics的programming开始讲的

Getting Start

OPENGL

  • 是一个进行图像处理的工具
  • 可以被认为是API,但是实际上是specification
    • 明确说明了每个function应该的输入和输出,以及如何perform
    • 用户在用这个说明来解决问题,因为没有给出明确的implement的过程,所以只要结果符合规则,怎么implement都可以

Core-profile vs Immediate mode

  • 以前的版本用的是immediate mode
    • 比较好用来画图
    • 具体的是实现都在lib里面,developer不是很好的能看到如何计算
    • 效率越来越低
  • Core-profile
    • 在3.2版本之后改成了这个
    • 强制使用modern practices,如果想要用被分出去的function就会直接报错
    • 效率高,更灵活,更难学

extensions

  • 支持extensions,只要检查支不支持graphic card就可以知道能不能用
  • 可以直接用比较新的东西,不用等着OPENGL更新新的功能
  • 需要在用之前判断他是不是available的,如果不是需要用原来的方法搞

State Machine

  • OpenGL自己就是一个State Machine:一个var的集合,来判断他现在应该如何操作
  • state -> context
    • 改变state:设定一些options,操作一些buffer,在现在的context来render
  • 例子:
    • 如果我想画三角形,而不是画线了,就改变draw的state
    • 只要这个改变传达到了,下一条线就画的是三角形了
  • state-changing用来改变context,state-using在现在的state上面开始进行操作

Objects

  • 一个集合来表现OpenGL的subset的state
    • 比如可以用一个object来表示对window的设定,可以设置大小,设置支持的颜色等等
      1
      2
      3
      4
      5
      6
      // The State of OpenGL
      struct OpenGL_Context {
      ...
      object_name* object_Window_Target;
      ...
      };
1
2
3
4
5
6
7
8
9
10
// create object
unsigned int objectId = 0;
glGenObject(1, &objectId);
// bind object to context
glBindObject(GL_WINDOW_TARGET, objectId);
// set options of object currently bound to GL_WINDOW_TARGET
glSetObjectOption(GL_WINDOW_TARGET, GL_OPTION_WINDOW_WIDTH, 800);
glSetObjectOption(GL_WINDOW_TARGET, GL_OPTION_WINDOW_HEIGHT, 600);
// set context target back to default
glBindObject(GL_WINDOW_TARGET, 0);
  • 流程
    • 首先创建了一个object,里面存了一个ref是这个object的id
    • 然后把这个object和context的目标位置bind在了一起
    • 设置了这些window的参数
    • 最后un-bind这两个东西,把window target改回原来的值
  • 这样的话我们可以创建很多object,提前设置好里面的量,等到需要用的时候就直接bind就可以用了
    • 比如我们有一堆object包含了小人,小马,小鹿
    • 想画哪个就把哪个绑定到draw里面,就可以直接画出来了

Crateing a window

因为操作系统的问题,所有操作系统上面不是很一样。但是已经有一些提供这些功能的函数了,这里用的是GLFW

GLFW

  • 一个lib,用C写的,主要目的是提供把东西渲染到屏幕的功能
  • 可以创建一个context,定义窗口的params,处理用户的输入

已经一口气配置好了这些!
https://www.jianshu.com/p/25d5fbf792a2
记得在link lib里面把openGL的framework加进去!!!!!

GLAD

  • 因为openGL还需要不同版本的driver的支持,需要有东西来处理这部分的内容
  • 和其他的东西不同,GLAD用的是web service
    • 在这个网页上选择好语言,版本号,确保profile是core,然后生成
    • 直接下载下来对应的zip,然后把include放进include里面,.c文件放在project里面
  • 莫名其妙并不需要这一步,神奇,可能是我在include里面已经搞进来了!!

Hello Window

初始化

1
2
3
4
5
6
7
8
9
10
int main()
{
glfwInit();
glfwWindowHint(GLFW_CONTEXT_VERSION_MAJOR, 3);
glfwWindowHint(GLFW_CONTEXT_VERSION_MINOR, 3);
glfwWindowHint(GLFW_OPENGL_PROFILE, GLFW_OPENGL_CORE_PROFILE);
//glfwWindowHint(GLFW_OPENGL_FORWARD_COMPAT, GL_TRUE);

return 0;
}
  • 首先进行了初始化
  • 然后configure了GLFW,设置了a large enum of possible options prefixed with GLFW_. (第三行就是最小) -> 大概是设置要用GLFW的版本号
  • 然后也告诉了他想用core

  • 然后需要使用glfwCreateWindow这个函数,来创建这个GLFWwindow* window的变量
    • 创建的函数需要窗口的长宽
    • 窗口名
  • 创建完之后就可以把这个窗口设置成glfwMakeContextCurrent(window);也就是说设置成了现在的thread里面
    1
    2
    3
    4
    5
    6
    7
    8
    GLFWwindow* window = glfwCreateWindow(800, 600, "LearnOpenGL", NULL, NULL);
    if (window == NULL)
    {
    std::cout << "Failed to create GLFW window" << std::endl;
    glfwTerminate();
    return -1;
    }
    glfwMakeContextCurrent(window);

GLAD

  • GLAD是为OpenGL来管理这些函数的,在使用这些函数之前需要初始化GLAD
    1
    2
    3
    4
    5
    if (!gladLoadGLLoader((GLADloadproc)glfwGetProcAddress))
    {
    std::cout << "Failed to initialize GLAD" << std::endl;
    return -1;
    }

viewpoint

  • 在开始render之前我们还需要告诉GL渲染窗口的大小,用到了glViewport这个函数
    • 前面两个参数定义了这个窗口左下角的坐标
    • 后面两个参数定义了需要render的窗口的大小
  • 每次调整window的大小的时候viewport也需要被调整

engines

  • 我们希望这个engine可以一直持续画图,直到最后我们告诉这个窗口要关闭,所以要建立一个循环

    1
    2
    3
    4
    5
    while(!glfwWindowShouldClose(window))
    {
    glfwSwapBuffers(window);
    glfwPollEvents();
    }
  • 在这个循环里面,pollevent是来检查是不是有trigger进来的事情(比如键盘输入),更新窗口的状态,并且call相应的函数

  • swapbuffer,会交换color buffer(包括每个像素点颜色的buffer),然后show在窗口里面

last thing

  • glfwTerminate退出这个循环之后,需要清除这些相关的资源,用这个函数放在最底下

input

  • 需要一些键盘上的操作来调整的时候,写了一个processInput的函数
  • 比如下面这个函数就是检测了有没有按下去esc,如果按了的话就关闭窗口
  • 写完之后把这个函数在while循环里面调用
    1
    2
    3
    4
    5
    void processInput(GLFWwindow *window)
    {
    if(glfwGetKey(window, GLFW_KEY_ESCAPE) == GLFW_PRESS)
    glfwSetWindowShouldClose(window, true);
    }

rendering

  • 希望在一个loop里面放上去所有的rendering的命令,整个循环看起来应该是这个样子的
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    // render loop
    while(!glfwWindowShouldClose(window))
    {
    // input
    processInput(window);

    // rendering commands here
    ...

    // check and call events and swap the buffers
    glfwPollEvents();
    glfwSwapBuffers(window);
    }

hello triangle


opengl里面所有东西都是在3D的空间里的,但是屏幕上显示的东西是2D的。整个这个转换的过程叫做graphics pipeline,可以分成两个步骤:第一个是把物体的3D坐标转化成2D的坐标,第二个是把2D的坐标转化成pixel上面的具体值

pipeline

  • 所有的转化步骤都可以parallel的进行,现在的显卡有很多小的core来进行 -> shaders
  • 在最开始的时候pass进去了一个list的3D坐标 (Vertex Data)
  • 第一步:vertex shader
    • 把3D的坐标转化成不同的3D坐标(相当于把数据转化成点?)
  • primitive assembly
    • 从上一步得到的左右的点得到输入
    • 然后形成一个基本的图形
  • geometry shader
    • 根据新给的点,形成新的不同的形状,比如在例子里面形成了新的一条线
  • rasterization stage
    • 把上面得到的primitives map到最后的屏幕上面的相应的pixel上面
  • Clipping
    • 这一步丢掉了所有在视线外面的fragments,提升性能
  • fragment shader
    • 计算这个pixel最后的颜色,会在这一步计算光影,以及光线的颜色等等东西
  • 当每个像素的颜色决定了以后,这个object会被送到alpha test和blending
    • 这一步会测试深度原因,判断fragment是在物体的前面还是后面
    • 还会考虑透明度的问题
      虽然上面的东西很复杂,但是在实际应用的时候只需要要考虑vertex和fragment shader

vertex input

  • openGL是3D的东西,所有的点设置input的时候都需要设置三维的坐标 xyz
  • 只有在坐标在 -1 到 1 中间的时候,才会处理这些坐标,这个范围里面的数字是根据屏幕的比例得出来的normalized device coordinates
  • 比如在这个例子里面,需要的渲染一个三角形,那么需要这个三角形的三个点的坐标。注意这个例子里面根本没有考虑深度,而是直接画在了平面上面

    1
    2
    3
    4
    5
    float vertices[] = {
    -0.5f, -0.5f, 0.0f,
    0.5f, -0.5f, 0.0f,
    0.0f, 0.5f, 0.0f
    };
  • 在定义好坐标之后需要把这个东西放进vertex shader里面,需要在GPU里面创建一部分内存来存储这个数据,并且需要在GPU里面存储大量的数据(这样不用每次都送了)

  • 每个部分的object都会有一个自己的buffer id,可以通过下面的方法生成一个id。也可以把一串array绑到这个id上面
    1
    2
    unsigned int VBO;
    glGenBuffers(1, &VBO);

CS231Nassignment2之Batch Normalization

Posted on 2019-04-15 | Edited on 2019-04-18 | In 图像处理 , Deep Learning , CS231n作业

target

  • 之前的内容讲了lr的优化方法,比如Adam,另一种方法是根据改变网络的结构,make it easy to train -> batch normalization
  • 想去掉一些uncorrelated features(不相关的特征),可以在训练数据之前preprocess,变成0-centered分布,这样第一层是没有问题的,但是后面的层里还是会出问题
  • 所以把normalization的部分加入了DN里面,加入了一个BN层,会估计mean和standard deviation of each feature,这样重新centre和normalized
  • learnable shift and scale parameters for each feature dimension
  • 核心思想:粗暴的用BN来解决weights初始化的问题

ref:https://kratzert.github.io/2016/02/12/understanding-the-gradient-flow-through-the-batch-normalization-layer.html

Read more »

CppPrimer笔记

Posted on 2019-04-15 | Edited on 2019-04-16 | In 编程语言 , Cpp

第一部分 Basics

  • C++在编译的时候就会检查类型
  • allow programmers to define types that include operations as well as data
Read more »

关于numpy里面random.rand和randn的区别

Posted on 2019-04-11 | Edited on 2019-04-18 | In 编程语言 , Python , Numpy

python里面常用的两个产生随机数的函数,两个不太一样

其中 np.random.rand()是用来产生0-1之间的随机数的,这个最近应用最多的地方是在产生一个从a-b范围里面的数字,这时候可以先产生一个巨大的随机0-1的矩阵,然后再乘以a和b之间的差

np.random.randn()产生的是随机正态分布的标准值,外面可以乘上std就是需要的正态分布,这样可以用来初始化深度网络的weights,括号里填的都是生成的东西的维度

  • 另外一个问题,randn的参数需要的是inter,所以在输入的时候要不是选择
    • np.random.randn(x0.shape[0], x0.shape[1])
    • np.random.randn(*x0.shape)
1…567…9
RUOPENG XU

RUOPENG XU

85 posts
58 categories
92 tags
© 2022 RUOPENG XU
Powered by Hexo v3.8.0
|
Theme – NexT.Gemini v7.0.1