GoogLeNet
我们将要介绍的GoogLeNet,指的是改进后的GoogLeNet,也叫做Inception V1,没错,Inception,翻译过来就是 '盗梦空间'。
《盗梦空间》电影中出现了梦中梦,而这里的GoogLeNet中出现了网中网。梦境层数很深,GoogLeNet的网络层数也很深,emm,很有意思~
ps: 至于GoogLeNet的naive版本,可以自行网上搜索~
相较于之前的网络,GoogLeNet网络层数更多,网络变得更深,网络结构也变得更加复杂。
不过,虽然网络结构复杂,但由于使用了1*1
卷积来减少通道数,GoogLeNet所包含的参数量不增反减,这也是GoogLeNet表现如此出众的重要原因之一。
GoogleNet的网络结构参数表如下:
type
:网络层类型
patch size/stride
:(卷积核or池化窗口)尺寸/(卷积/池化)步长
output size
:该层输出特征图的shape
GoogLeNet重复使用了inception,和NiN基础块一样,它也是一个单独的块,不妨记作inception block
,蓝色框起来的是便是inception block
需要的参数。
inception block
结构如下:
inception block
使用了4个并行的分支,最后在通道维度上将4个分支的结果做了concat融合。这样,网络变宽了,我们无需考虑到底是用卷积层还是池化层,卷积核的尺寸是3x3
还是5x5
比较好,这一切,都交给模型,让模型自己去学习。
网络结构表中inception的参数说明:
#1x1
:1x1
卷积输出通道数(第一分支)
#3x3reduce
:3x3
卷积之前的1x1
卷积输出通道数(第二分支)
#3x3
:3x3
卷积输出通道数(第二分支)
#5x5 reduce
:5x5
卷积之前的1x1
卷积输出通道数(第三分支)
#5x5
:5x5
卷积输出通道数(第三分支)
pool proj
:池化层后面的1x1
卷积输出通道数(第四分支)
值得注意的是,inception block
输出特征图的尺寸和输入图片尺寸是一样的,只是通道数发生了改变,因此4个分支的输出结果可以在通道维度上进行concat融合。
根据网络结构参数表,就可以着手实现GoogLeNet了。需要说明的是,接下来实现的GoogLeNet并不完全遵循原论文,而是添加了一些现在常用的操作。
PyTorch 实现GoogLeNet
由于网络中大量用到卷积-批归一化-激活
操作,于是可以将其打包成一个模块conv_block
:
class conv_block(nn.Module):
def __init__(self, in_channels, out_channels, **kwargs):
super().__init__()
self.relu = nn.ReLU()
self.conv = nn.Conv2d(in_channels, out_channels, **kwargs)
self.batchnorm = nn.BatchNorm2d(out_channels)
def forward(self, x):
return self.relu(self.batchnorm(self.conv(x)))
然后根据inception的结构图实现inception block
:
class Inception_block(nn.Module):
# Inception参数顺序:in_channels, out_1x1, red_3x3, out_3x3, red_5x5, out_5x5, out_1x1pool
def __init__(
self, in_channels, out_1x1, red_3x3, out_3x3, red_5x5, out_5x5, out_1x1pool
):
"""
in_channels:输入通道数,一般是3
out_1x1:对应`#1x1`,1x1卷积输出通道数(第一分支)
red_3x3:对应`#3x3 reduce`,3x3卷积之前的`1x1`卷积输出通道数(第二分支)
out_3x3:对应`#3x3`,`卷积输出通道数(第二分支)
red_5x5:对应`#5x5 reduce`,5x5卷积之前的`1x1`卷积输出通道数(第三分支)
out_5x5:对应`#5x5`,5x5卷积输出通道数(第三分支)
out_1x1pool:对应`pool proj`,池化层后面的`1x1`卷积输出通道数(第四分支)
注意第四分支的池化层无需从外部传参,因为池化操作不会改变通道数。
"""
super().__init__()
self.branch1 = conv_block(in_channels, out_1x1, kernel_size=(1, 1))
self.branch2 = nn.Sequential(
conv_block(in_channels, red_3x3, kernel_size=(1, 1)),
conv_block(red_3x3, out_3x3, kernel_size=(3, 3), padding=(1, 1)),
)
self.branch3 = nn.Sequential(
conv_block(in_channels, red_5x5, kernel_size=(1, 1)),
conv_block(red_5x5, out_5x5, kernel_size=(5, 5), padding=(2, 2)),
)
self.branch4 = nn.Sequential(
nn.MaxPool2d(kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
conv_block(in_channels, out_1x1pool, kernel_size=(1, 1)),
)
def forward(self, x):
#将4个分支输出的结果在通道维度上做concat
return torch.cat(
[self.branch1(x), self.branch2(x), self.branch3(x), self.branch4(x)], 1
)
对照GoogLeNet的网络结构参数表,实现GoogLeNet:
class GoogLeNet(nn.Module):
def __init__(self, num_classes=1000):
super().__init__()
self.conv1 = conv_block(
in_channels=3,
out_channels=64,
kernel_size=(7, 7),
stride=(2, 2),
padding=(3, 3),
)
self.maxpool1 = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
self.conv2 = conv_block(64, 192, kernel_size=3, stride=1, padding=1)
self.maxpool2 = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
# Inception参数顺序:in_channels, out_1x1, red_3x3, out_3x3, red_5x5, out_5x5, out_1x1pool
self.inception3a = Inception_block(192, 64, 96, 128, 16, 32, 32)
self.inception3b = Inception_block(256, 128, 128, 192, 32, 96, 64)
self.maxpool3 = nn.MaxPool2d(kernel_size=(3, 3), stride=2, padding=1)
self.inception4a = Inception_block(480, 192, 96, 208, 16, 48, 64)
self.inception4b = Inception_block(512, 160, 112, 224, 24, 64, 64)
self.inception4c = Inception_block(512, 128, 128, 256, 24, 64, 64)
self.inception4d = Inception_block(512, 112, 144, 288, 32, 64, 64)
self.inception4e = Inception_block(528, 256, 160, 320, 32, 128, 128)
self.maxpool4 = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
self.inception5a = Inception_block(832, 256, 160, 320, 32, 128, 128)
self.inception5b = Inception_block(832, 384, 192, 384, 48, 128, 128)
self.avgpool = nn.AvgPool2d(kernel_size=7, stride=1)
self.dropout = nn.Dropout(p=0.4)
self.fc1 = nn.Linear(1024, num_classes)
def forward(self, x):
x = self.conv1(x)
x = self.maxpool1(x)
x = self.conv2(x)
x = self.maxpool2(x)
x = self.inception3a(x)
x = self.inception3b(x)
x = self.maxpool3(x)
x = self.inception4a(x)
x = self.inception4b(x)
x = self.inception4c(x)
x = self.inception4d(x)
x = self.inception4e(x)
x = self.maxpool4(x)
x = self.inception5a(x)
x = self.inception5b(x)
x = self.avgpool(x)
x = x.reshape(x.shape[0], -1)
x = self.dropout(x)
x = self.fc1(x)
return x
测试一下,输入4张224*224
的3通道图片:
参考:
[1] https://www.geeksforgeeks.org/understanding-googlenet-model-cnn-architecture/ [2] https://www.youtube.com/watch?v=uQc4Fs7yx5I&list=PLhhyoLH6IjfxeoooqP9rhU3HJIAVAJ3Vz&index=18
南极Python交流群已成立,长按下方二维码添加我的微信,备注加群即可,欢迎进群学习交流(划水
)

原创不易,感谢点赞,分享和在看的你!




