Pytorch CIFAR10图像分类 MobileNet v1篇

Pytorch CIFAR10图像分类 MobileNet v1篇

4.定义网络(MobileNet v1)

在之前的文章中讲的AlexNet、VGG、GoogLeNet以及ResNet网络,它们都是传统卷积神经网络(都是使用的传统卷积层),缺点在于内存需求大、运算量大导致无法在移动设备以及嵌入式设备上运行。而本文要讲的MobileNet网络就是专门为移动端,嵌入式端而设计。

我也看了论文,如果想仔细研究一下MobileNet的话,可以看我的另一篇博客【论文泛读】轻量化之MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications[0]

MobileNet网络是由google团队在2017年提出的,专注于移动端或者嵌入式设备中的轻量级CNN网络。相比传统卷积神经网络,在准确率小幅降低的前提下大大减少模型参数与运算量。(相比VGG16准确率减少了0.9%,但模型参数只有VGG的1/32)。

要说MobileNet网络的优点,无疑是其中的Depthwise Convolution结构(大大减少运算量和参数数量)。下图展示了传统卷积与DW卷积的差异,在传统卷积中,每个卷积核的channel与输入特征矩阵的channel相等(每个卷积核都会与输入特征矩阵的每一个维度进行卷积运算)。而在DW卷积中,每个卷积核的channel都是等于1的(每个卷积核只负责输入特征矩阵的一个channel,故卷积核的个数必须等于输入特征矩阵的channel数,从而使得输出特征矩阵的channel数也等于输入特征矩阵的channel数)

刚刚说了使用DW卷积后输出特征矩阵的channel是与输入特征矩阵的channel相等的,如果想改变/自定义输出特征矩阵的channel,那只需要在DW卷积后接上一个PW卷积即可,如下图所示,其实PW卷积就是普通的卷积而已(只不过卷积核大小为1)。通常DW卷积和PW卷积是放在一起使用的,一起叫做Depthwise Separable Convolution(深度可分卷积)。

那Depthwise Separable Convolution(深度可分卷积)与传统的卷积相比有到底能节省多少计算量呢,下图对比了这两个卷积方式的计算量,其中Df是输入特征矩阵的宽高(这里假设宽和高相等),Dk是卷积核的大小,M是输入特征矩阵的channel,N是输出特征矩阵的channel,卷积计算量近似等于卷积核的高 x 卷积核的宽 x 卷积核的channel x 输入特征矩阵的高 x 输入特征矩阵的宽(这里假设stride等于1),在我们mobilenet网络中DW卷积都是是使用3×3大小的卷积核。所以理论上普通卷积计算量是DW+PW卷积的8到9倍(公式来源于原论文):

在了解完Depthwise Separable Convolution(深度可分卷积)后在看下mobilenet v1的网络结构,左侧的表格是mobileNetv1的网络结构,表中标Conv的表示普通卷积,Conv dw代表刚刚说的DW卷积,s表示步距,根据表格信息就能很容易的搭建出mobileNet v1网络。在mobilenetv1原论文中,还提出了两个超参数,一个是α一个是β。α参数是一个倍率因子,用来调整卷积核的个数,β是控制输入网络的图像尺寸参数,下图右侧给出了使用不同α和β网络的分类准确率,计算量以及模型参数:

首先我们还是得判断是否可以利用GPU,因为GPU的速度可能会比我们用CPU的速度快20-50倍左右,特别是对卷积神经网络来说,更是提升特别明显。

device = 'cuda' if torch.cuda.is_available() else 'cpu'

接着我们可以定义网络,在pytorch之中,定义我们的深度可分离卷积来说,我们需要调一个groups参数,就可以构建深度可分离卷积了。

class Block(nn.Module):
    '''Depthwise conv + Pointwise conv'''
    def __init__(self,in_channels,out_channels,stride=1):
        super(Block,self).__init__()
        # groups参数就是深度可分离卷积的关键
        self.conv1 = nn.Conv2d(in_channels,in_channels,kernel_size=3,stride=stride,
                               padding=1,groups=in_channels,bias=False)
        self.bn1 = nn.BatchNorm2d(in_channels)
        self.relu1 = nn.ReLU()
        self.conv2 = nn.Conv2d(in_channels,out_channels,kernel_size=1,stride=1,padding=0,bias=False)
        self.bn2 = nn.BatchNorm2d(out_channels)
        self.relu2 = nn.ReLU()
    def forward(self,x):
        x = self.relu1(self.bn1(self.conv1(x)))
        x = self.relu2(self.bn2(self.conv2(x)))
        return x
        
# 深度可分离卷积 DepthWise Separable Convolution
class MobileNetV1(nn.Module):
    # (128,2) means conv channel=128, conv stride=2, by default conv stride=1
    cfg = [64,(128,2),128,(256,2),256,(512,2),512,512,512,512,512,(1024,2),1024]
    
    def __init__(self, num_classes=10,alpha=1.0,beta=1.0):
        super(MobileNetV1,self).__init__()
        self.conv1 = nn.Sequential(
            nn.Conv2d(3,32,kernel_size=3,stride=1,bias=False),
            nn.BatchNorm2d(32),
            nn.ReLU()
        )
        self.avg = nn.AvgPool2d(kernel_size=2)
        self.layers = self._make_layers(in_channels=32)
        self.linear = nn.Linear(1024,num_classes)
    
    def _make_layers(self, in_channels):
        layers = []
        for x in self.cfg:
            out_channels = x if isinstance(x,int) else x[0]
            stride = 1 if isinstance(x,int) else x[1]
            layers.append(Block(in_channels,out_channels,stride))
            in_channels = out_channels
        return nn.Sequential(*layers)
    
    def forward(self,x):
        x = self.conv1(x)
        x = self.layers(x)
        x = self.avg(x)
        x = x.view(x.size()[0],-1)
        x = self.linear(x)
        return x
net = MobileNetV1(num_classes=10).to(device)
summary(net,(2,3,32,32))
==========================================================================================
Layer (type:depth-idx)                   Output Shape              Param #
==========================================================================================
MobileNetV1                              --                        --
├─Sequential: 1-1                        [2, 32, 30, 30]           --
│    └─Conv2d: 2-1                       [2, 32, 30, 30]           864
│    └─BatchNorm2d: 2-2                  [2, 32, 30, 30]           64
│    └─ReLU: 2-3                         [2, 32, 30, 30]           --
├─Sequential: 1-2                        [2, 1024, 2, 2]           --
│    └─Block: 2-4                        [2, 64, 30, 30]           --
│    │    └─Conv2d: 3-1                  [2, 32, 30, 30]           288
│    │    └─BatchNorm2d: 3-2             [2, 32, 30, 30]           64
│    │    └─ReLU: 3-3                    [2, 32, 30, 30]           --
│    │    └─Conv2d: 3-4                  [2, 64, 30, 30]           2,048
│    │    └─BatchNorm2d: 3-5             [2, 64, 30, 30]           128
│    │    └─ReLU: 3-6                    [2, 64, 30, 30]           --
│    └─Block: 2-5                        [2, 128, 15, 15]          --
│    │    └─Conv2d: 3-7                  [2, 64, 15, 15]           576
│    │    └─BatchNorm2d: 3-8             [2, 64, 15, 15]           128
│    │    └─ReLU: 3-9                    [2, 64, 15, 15]           --
│    │    └─Conv2d: 3-10                 [2, 128, 15, 15]          8,192
│    │    └─BatchNorm2d: 3-11            [2, 128, 15, 15]          256
│    │    └─ReLU: 3-12                   [2, 128, 15, 15]          --
│    └─Block: 2-6                        [2, 128, 15, 15]          --
│    │    └─Conv2d: 3-13                 [2, 128, 15, 15]          1,152
│    │    └─BatchNorm2d: 3-14            [2, 128, 15, 15]          256
│    │    └─ReLU: 3-15                   [2, 128, 15, 15]          --
│    │    └─Conv2d: 3-16                 [2, 128, 15, 15]          16,384
│    │    └─BatchNorm2d: 3-17            [2, 128, 15, 15]          256
│    │    └─ReLU: 3-18                   [2, 128, 15, 15]          --
│    └─Block: 2-7                        [2, 256, 8, 8]            --
│    │    └─Conv2d: 3-19                 [2, 128, 8, 8]            1,152
│    │    └─BatchNorm2d: 3-20            [2, 128, 8, 8]            256
│    │    └─ReLU: 3-21                   [2, 128, 8, 8]            --
│    │    └─Conv2d: 3-22                 [2, 256, 8, 8]            32,768
│    │    └─BatchNorm2d: 3-23            [2, 256, 8, 8]            512
│    │    └─ReLU: 3-24                   [2, 256, 8, 8]            --
│    └─Block: 2-8                        [2, 256, 8, 8]            --
│    │    └─Conv2d: 3-25                 [2, 256, 8, 8]            2,304
│    │    └─BatchNorm2d: 3-26            [2, 256, 8, 8]            512
│    │    └─ReLU: 3-27                   [2, 256, 8, 8]            --
│    │    └─Conv2d: 3-28                 [2, 256, 8, 8]            65,536
│    │    └─BatchNorm2d: 3-29            [2, 256, 8, 8]            512
│    │    └─ReLU: 3-30                   [2, 256, 8, 8]            --
│    └─Block: 2-9                        [2, 512, 4, 4]            --
│    │    └─Conv2d: 3-31                 [2, 256, 4, 4]            2,304
│    │    └─BatchNorm2d: 3-32            [2, 256, 4, 4]            512
│    │    └─ReLU: 3-33                   [2, 256, 4, 4]            --
│    │    └─Conv2d: 3-34                 [2, 512, 4, 4]            131,072
│    │    └─BatchNorm2d: 3-35            [2, 512, 4, 4]            1,024
│    │    └─ReLU: 3-36                   [2, 512, 4, 4]            --
│    └─Block: 2-10                       [2, 512, 4, 4]            --
│    │    └─Conv2d: 3-37                 [2, 512, 4, 4]            4,608
│    │    └─BatchNorm2d: 3-38            [2, 512, 4, 4]            1,024
│    │    └─ReLU: 3-39                   [2, 512, 4, 4]            --
│    │    └─Conv2d: 3-40                 [2, 512, 4, 4]            262,144
│    │    └─BatchNorm2d: 3-41            [2, 512, 4, 4]            1,024
│    │    └─ReLU: 3-42                   [2, 512, 4, 4]            --
│    └─Block: 2-11                       [2, 512, 4, 4]            --
│    │    └─Conv2d: 3-43                 [2, 512, 4, 4]            4,608
│    │    └─BatchNorm2d: 3-44            [2, 512, 4, 4]            1,024
│    │    └─ReLU: 3-45                   [2, 512, 4, 4]            --
│    │    └─Conv2d: 3-46                 [2, 512, 4, 4]            262,144
│    │    └─BatchNorm2d: 3-47            [2, 512, 4, 4]            1,024
│    │    └─ReLU: 3-48                   [2, 512, 4, 4]            --
│    └─Block: 2-12                       [2, 512, 4, 4]            --
│    │    └─Conv2d: 3-49                 [2, 512, 4, 4]            4,608
│    │    └─BatchNorm2d: 3-50            [2, 512, 4, 4]            1,024
│    │    └─ReLU: 3-51                   [2, 512, 4, 4]            --
│    │    └─Conv2d: 3-52                 [2, 512, 4, 4]            262,144
│    │    └─BatchNorm2d: 3-53            [2, 512, 4, 4]            1,024
│    │    └─ReLU: 3-54                   [2, 512, 4, 4]            --
│    └─Block: 2-13                       [2, 512, 4, 4]            --
│    │    └─Conv2d: 3-55                 [2, 512, 4, 4]            4,608
│    │    └─BatchNorm2d: 3-56            [2, 512, 4, 4]            1,024
│    │    └─ReLU: 3-57                   [2, 512, 4, 4]            --
│    │    └─Conv2d: 3-58                 [2, 512, 4, 4]            262,144
│    │    └─BatchNorm2d: 3-59            [2, 512, 4, 4]            1,024
│    │    └─ReLU: 3-60                   [2, 512, 4, 4]            --
│    └─Block: 2-14                       [2, 512, 4, 4]            --
│    │    └─Conv2d: 3-61                 [2, 512, 4, 4]            4,608
│    │    └─BatchNorm2d: 3-62            [2, 512, 4, 4]            1,024
│    │    └─ReLU: 3-63                   [2, 512, 4, 4]            --
│    │    └─Conv2d: 3-64                 [2, 512, 4, 4]            262,144
│    │    └─BatchNorm2d: 3-65            [2, 512, 4, 4]            1,024
│    │    └─ReLU: 3-66                   [2, 512, 4, 4]            --
│    └─Block: 2-15                       [2, 1024, 2, 2]           --
│    │    └─Conv2d: 3-67                 [2, 512, 2, 2]            4,608
│    │    └─BatchNorm2d: 3-68            [2, 512, 2, 2]            1,024
│    │    └─ReLU: 3-69                   [2, 512, 2, 2]            --
│    │    └─Conv2d: 3-70                 [2, 1024, 2, 2]           524,288
│    │    └─BatchNorm2d: 3-71            [2, 1024, 2, 2]           2,048
│    │    └─ReLU: 3-72                   [2, 1024, 2, 2]           --
│    └─Block: 2-16                       [2, 1024, 2, 2]           --
│    │    └─Conv2d: 3-73                 [2, 1024, 2, 2]           9,216
│    │    └─BatchNorm2d: 3-74            [2, 1024, 2, 2]           2,048
│    │    └─ReLU: 3-75                   [2, 1024, 2, 2]           --
│    │    └─Conv2d: 3-76                 [2, 1024, 2, 2]           1,048,576
│    │    └─BatchNorm2d: 3-77            [2, 1024, 2, 2]           2,048
│    │    └─ReLU: 3-78                   [2, 1024, 2, 2]           --
├─AvgPool2d: 1-3                         [2, 1024, 1, 1]           --
├─Linear: 1-4                            [2, 10]                   10,250
==========================================================================================
Total params: 3,217,226
Trainable params: 3,217,226
Non-trainable params: 0
Total mult-adds (M): 90.33
==========================================================================================
Input size (MB): 0.02
Forward/backward pass size (MB): 12.22
Params size (MB): 12.87
Estimated Total Size (MB): 25.11
==========================================================================================

首先从我们summary可以看到,我们定义模型的参数量和计算量都是比较小的,这也是作为MobileNet的一大有点,我们输入的是(batch,3,32,32)的张量,并且这里也能看到每一层后我们的图像输出大小的变化,最后输出10个参数,再通过softmax函数就可以得到我们每个类别的概率了。

我们也可以打印出我们的模型来观察

MobileNetV1(
  (conv1): Sequential(
    (0): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), bias=False)
    (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU()
  )
  (avg): AvgPool2d(kernel_size=2, stride=2, padding=0)
  (layers): Sequential(
    (0): Block(
      (conv1): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False)
      (bn1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu1): ReLU()
      (conv2): Conv2d(32, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu2): ReLU()
    )
    (1): Block(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=64, bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu1): ReLU()
      (conv2): Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu2): ReLU()
    )
    (2): Block(
      (conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=128, bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu1): ReLU()
      (conv2): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu2): ReLU()
    )
    (3): Block(
      (conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=128, bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu1): ReLU()
      (conv2): Conv2d(128, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu2): ReLU()
    )
    (4): Block(
      (conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=256, bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu1): ReLU()
      (conv2): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu2): ReLU()
    )
    (5): Block(
      (conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=256, bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu1): ReLU()
      (conv2): Conv2d(256, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu2): ReLU()
    )
    (6): Block(
      (conv1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=512, bias=False)
      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu1): ReLU()
      (conv2): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu2): ReLU()
    )
    (7): Block(
      (conv1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=512, bias=False)
      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu1): ReLU()
      (conv2): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu2): ReLU()
    )
    (8): Block(
      (conv1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=512, bias=False)
      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu1): ReLU()
      (conv2): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu2): ReLU()
    )
    (9): Block(
      (conv1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=512, bias=False)
      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu1): ReLU()
      (conv2): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu2): ReLU()
    )
    (10): Block(
      (conv1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=512, bias=False)
      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu1): ReLU()
      (conv2): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu2): ReLU()
    )
    (11): Block(
      (conv1): Conv2d(512, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=512, bias=False)
      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu1): ReLU()
      (conv2): Conv2d(512, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn2): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu2): ReLU()
    )
    (12): Block(
      (conv1): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1024, bias=False)
      (bn1): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu1): ReLU()
      (conv2): Conv2d(1024, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn2): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu2): ReLU()
    )
  )
  (linear): Linear(in_features=1024, out_features=10, bias=True)
)

如果你的电脑有多个GPU,这段代码可以利用GPU进行并行计算,加快运算速度

net = MobileNetV1(num_classes=10).to(device)
if device == 'cuda':
    net = nn.DataParallel(net)
    # 当计算图不会改变的时候(每次输入形状相同,模型不改变)的情况下可以提高性能,反之则降低性能
    torch.backends.cudnn.benchmark = True

5. 定义损失函数和优化器

pytorch将深度学习中常用的优化方法全部封装在torch.optim之中,所有的优化方法都是继承基类optim.Optimizier
损失函数是封装在神经网络工具箱nn中的,包含很多损失函数

这里我使用的是SGD + momentum算法,并且我们损失函数定义为交叉熵函数,除此之外学习策略定义为动态更新学习率,如果5次迭代后,训练的损失并没有下降,那么我们便会更改学习率,会变为原来的0.5倍,最小降低到0.00001

如果想进一步了解优化器和学习率策略,可以参考以下资料

  • Pytorch Note15 优化算法1 梯度下降(Gradient descent varients)[0]
  • Pytorch Note16 优化算法2 动量法(Momentum)[0]
  • Pytorch Note34 学习率衰减[0]

这里决定迭代20次

import torch.optim as optim
optimizer = optim.SGD(net.parameters(), lr=1e-1, momentum=0.9, weight_decay=5e-4)
criterion = nn.CrossEntropyLoss()
scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, 'min', factor=0.5 ,patience = 5,min_lr = 0.000001) # 动态更新学习率
# scheduler = optim.lr_scheduler.MultiStepLR(optimizer, milestones=[75, 150], gamma=0.5)
import time
epoch = 20

6. 训练

首先定义模型的保存位置

import os
if not os.path.exists('./model'):
    os.makedirs('./model')
else:
    print('文件已存在')
save_path = './model/MoblieNetV1.pth'

我定义了一个train函数,在train函数中进行一个训练,并保存我们训练后的模型

from utils import plot_history
from utils import train
Acc, Loss, Lr = train(net, trainloader, testloader, epoch, optimizer, criterion, scheduler, save_path, verbose = True)
Epoch [  1/ 20]  Train Loss:1.755525  Train Acc:35.18% Test Loss:1.482050  Test Acc:45.63%  Learning Rate:0.100000	Time 01:02
Epoch [  2/ 20]  Train Loss:1.359267  Train Acc:50.75% Test Loss:1.313756  Test Acc:53.10%  Learning Rate:0.100000	Time 00:58
Epoch [  3/ 20]  Train Loss:1.123205  Train Acc:60.09% Test Loss:1.083381  Test Acc:61.47%  Learning Rate:0.100000	Time 00:57
Epoch [  4/ 20]  Train Loss:0.946333  Train Acc:67.18% Test Loss:0.966875  Test Acc:66.20%  Learning Rate:0.100000	Time 00:57
Epoch [  5/ 20]  Train Loss:0.842723  Train Acc:70.96% Test Loss:0.965495  Test Acc:67.11%  Learning Rate:0.100000	Time 00:55
Epoch [  6/ 20]  Train Loss:0.781058  Train Acc:73.30% Test Loss:0.895774  Test Acc:69.28%  Learning Rate:0.100000	Time 00:54
Epoch [  7/ 20]  Train Loss:0.748482  Train Acc:74.32% Test Loss:0.839037  Test Acc:71.43%  Learning Rate:0.100000	Time 00:57
Epoch [  8/ 20]  Train Loss:0.727354  Train Acc:75.25% Test Loss:0.762057  Test Acc:74.48%  Learning Rate:0.100000	Time 01:00
Epoch [  9/ 20]  Train Loss:0.701578  Train Acc:76.08% Test Loss:0.817960  Test Acc:72.46%  Learning Rate:0.100000	Time 00:57
Epoch [ 10/ 20]  Train Loss:0.690966  Train Acc:76.36% Test Loss:0.908296  Test Acc:68.63%  Learning Rate:0.100000	Time 00:58
Epoch [ 11/ 20]  Train Loss:0.680218  Train Acc:76.78% Test Loss:0.778566  Test Acc:73.70%  Learning Rate:0.100000	Time 00:57
Epoch [ 12/ 20]  Train Loss:0.673814  Train Acc:77.26% Test Loss:0.890464  Test Acc:71.39%  Learning Rate:0.100000	Time 00:58
Epoch [ 13/ 20]  Train Loss:0.673683  Train Acc:77.30% Test Loss:0.792788  Test Acc:72.98%  Learning Rate:0.100000	Time 00:58
Epoch [ 14/ 20]  Train Loss:0.663810  Train Acc:77.39% Test Loss:0.884162  Test Acc:70.07%  Learning Rate:0.100000	Time 00:59
Epoch [ 15/ 20]  Train Loss:0.656750  Train Acc:77.84% Test Loss:0.757378  Test Acc:74.75%  Learning Rate:0.100000	Time 01:00
Epoch [ 16/ 20]  Train Loss:0.648556  Train Acc:78.11% Test Loss:0.954492  Test Acc:67.81%  Learning Rate:0.100000	Time 01:00
Epoch [ 17/ 20]  Train Loss:0.646635  Train Acc:78.01% Test Loss:0.826150  Test Acc:71.81%  Learning Rate:0.100000	Time 01:03
Epoch [ 18/ 20]  Train Loss:0.638446  Train Acc:78.42% Test Loss:0.741989  Test Acc:74.34%  Learning Rate:0.100000	Time 01:01
Epoch [ 19/ 20]  Train Loss:0.634496  Train Acc:78.42% Test Loss:0.817648  Test Acc:73.02%  Learning Rate:0.100000	Time 00:59
Epoch [ 20/ 20]  Train Loss:0.631237  Train Acc:78.66% Test Loss:0.885431  Test Acc:69.61%  Learning Rate:0.100000	Time 01:01

然后可以分别打印,损失函数曲线,准确率曲线和学习率曲线

plot_history(epoch ,Acc, Loss, Lr)

损失函数曲线

准确率曲线

学习率曲线

7.测试

查看准确率

correct = 0   # 定义预测正确的图片数,初始化为0
total = 0     # 总共参与测试的图片数,也初始化为0
# testloader = torch.utils.data.DataLoader(testset, batch_size=32,shuffle=True, num_workers=2)
for data in testloader:  # 循环每一个batch
    images, labels = data
    images = images.to(device)
    labels = labels.to(device)
    net.eval()  # 把模型转为test模式
    if hasattr(torch.cuda, 'empty_cache'):
        torch.cuda.empty_cache()
    outputs = net(images)  # 输入网络进行测试
    
    # outputs.data是一个4x10张量,将每一行的最大的那一列的值和序号各自组成一个一维张量返回,第一个是值的张量,第二个是序号的张量。
    _, predicted = torch.max(outputs.data, 1)
    total += labels.size(0)          # 更新测试图片的数量
    correct += (predicted == labels).sum() # 更新正确分类的图片的数量

print('Accuracy of the network on the 10000 test images: %.2f %%' % (100 * correct / total))
 
Accuracy of the network on the 10000 test images: 69.23 %

可以看到MobileNet的模型在测试集中准确率达到70%左右

程序中的 torch.max(outputs.data, 1) ,返回一个tuple (元组)

而这里很明显,这个返回的元组的第一个元素是image data,即是最大的 值,第二个元素是label, 即是最大的值 的 索引!我们只需要label(最大值的索引),所以就会有_,predicted这样的赋值语句,表示忽略第一个返回值,把它赋值给 _, 就是舍弃它的意思;

查看每一类的准确率

 # 定义2个存储每类中测试正确的个数的 列表,初始化为0
class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))
# testloader = torch.utils.data.DataLoader(testset, batch_size=64,shuffle=True, num_workers=2)
net.eval()
with torch.no_grad():
    for data in testloader:
        images, labels = data
        images = images.to(device)
        labels = labels.to(device)
        if hasattr(torch.cuda, 'empty_cache'):
            torch.cuda.empty_cache()
        outputs = net(images)

        _, predicted = torch.max(outputs.data, 1)
    #4组(batch_size)数据中,输出于label相同的,标记为1,否则为0
        c = (predicted == labels).squeeze()
        for i in range(len(images)):      # 因为每个batch都有4张图片,所以还需要一个4的小循环
            label = labels[i]   # 对各个类的进行各自累加
            class_correct[label] += c[i]
            class_total[label] += 1
 
 
for i in range(10):
    print('Accuracy of %5s : %.2f %%' % (classes[i], 100 * class_correct[i] / class_total[i]))
Accuracy of airplane : 63.50 %
Accuracy of automobile : 79.20 %
Accuracy of  bird : 50.30 %
Accuracy of   cat : 56.30 %
Accuracy of  deer : 49.50 %
Accuracy of   dog : 76.20 %
Accuracy of  frog : 91.20 %
Accuracy of horse : 71.60 %
Accuracy of  ship : 90.50 %
Accuracy of truck : 69.40 %

抽样测试并可视化一部分结果

dataiter = iter(testloader)
images, labels = dataiter.next()
images_ = images
#images_ = images_.view(images.shape[0], -1)
images_ = images_.to(device)
labels = labels.to(device)
val_output = net(images_)
_, val_preds = torch.max(val_output, 1)

fig = plt.figure(figsize=(25,4))

correct = torch.sum(val_preds == labels.data).item()

val_preds = val_preds.cpu()
labels = labels.cpu()

print("Accuracy Rate = {}%".format(correct/len(images) * 100))

fig = plt.figure(figsize=(25,25))
for idx in np.arange(64):    
    ax = fig.add_subplot(8, 8, idx+1, xticks=[], yticks=[])
    #fig.tight_layout()
#     plt.imshow(im_convert(images[idx]))
    imshow(images[idx])
    ax.set_title("{}, ({})".format(classes[val_preds[idx].item()], classes[labels[idx].item()]), 
                 color = ("green" if val_preds[idx].item()==labels[idx].item() else "red"))
Accuracy Rate = 71.875%
<Figure size 1800x288 with 0 Axes>

8. 保存模型

torch.save(net,save_path[:-4]+'_'+str(epoch)+'.pth')
# torch.save(net, './model/MobileNetv1.pth')

9. 预测

读取本地图片进行预测

import torch
from PIL import Image
from torch.autograd import Variable
import torch.nn.functional as F
from torchvision import datasets, transforms
import numpy as np
 
classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

model = MobileNetV1(num_classes=10)

model = torch.load(save_path)  # 加载模型
# model = model.to('cuda')
model.eval()  # 把模型转为test模式

# 读取要预测的图片
img = Image.open("./airplane.jpg").convert('RGB') # 读取图像
img

接着我们就进行预测图片,不过这里有一个点,我们需要对我们的图片也进行transforms,因为我们的训练的时候,对每个图像也是进行了transforms的,所以我们需要保持一致

trans = transforms.Compose([transforms.Scale((32,32)),
                            transforms.ToTensor(),
                            transforms.Normalize(mean=(0.5, 0.5, 0.5), 
                                                 std=(0.5, 0.5, 0.5)),
                           ])
 
img = trans(img)
img = img.to(device)
# 图片扩展多一维,因为输入到保存的模型中是4维的[batch_size,通道,长,宽],而普通图片只有三维,[通道,长,宽]
img = img.unsqueeze(0)  
    # 扩展后,为[1,3,32,32]
output = model(img)
prob = F.softmax(output,dim=1) #prob是10个分类的概率
print("概率",prob)
value, predicted = torch.max(output.data, 1)
print("类别",predicted.item())
print(value)
pred_class = classes[predicted.item()]
print("分类",pred_class)
概率 tensor([[9.9107e-01, 2.8922e-05, 1.3522e-03, 2.8523e-04, 3.5258e-04, 2.8689e-05,
         2.4901e-05, 9.8299e-05, 6.4787e-03, 2.8210e-04]], device='cuda:0',
       grad_fn=<SoftmaxBackward>)
类别 0
tensor([7.6616], device='cuda:0')
分类 plane

这里就可以看到,我们最后的结果,分类为plane,我们的置信率大概是81.57%,很明显,这个分类是有问题的。

读取图片地址进行预测

我们也可以通过读取图片的url地址进行预测,这里我找了多个不同的图片进行预测

import requests
from PIL import Image
url = 'https://dss2.bdstatic.com/70cFvnSh_Q1YnxGkpoWK1HF6hhy/it/u=947072664,3925280208&fm=26&gp=0.jpg'
url = 'https://ss0.bdstatic.com/70cFuHSh_Q1YnxGkpoWK1HF6hhy/it/u=2952045457,215279295&fm=26&gp=0.jpg'
url = 'https://ss0.bdstatic.com/70cFvHSh_Q1YnxGkpoWK1HF6hhy/it/u=2838383012,1815030248&fm=26&gp=0.jpg'
url = 'https://gimg2.baidu.com/image_search/src=http%3A%2F%2Fwww.goupuzi.com%2Fnewatt%2FMon_1809%2F1_179223_7463b117c8a2c76.jpg&refer=http%3A%2F%2Fwww.goupuzi.com&app=2002&size=f9999,10000&q=a80&n=0&g=0n&fmt=jpeg?sec=1624346733&t=36ba18326a1e010737f530976201326d'
url = 'https://ss3.bdstatic.com/70cFv8Sh_Q1YnxGkpoWK1HF6hhy/it/u=2799543344,3604342295&fm=224&gp=0.jpg'
# url = 'https://ss1.bdstatic.com/70cFuXSh_Q1YnxGkpoWK1HF6hhy/it/u=2032505694,2851387785&fm=26&gp=0.jpg'
response = requests.get(url, stream=True)
print (response)
img = Image.open(response.raw)
img

这里和以前一样

trans = transforms.Compose([transforms.Scale((32,32)),
                            transforms.ToTensor(),
                            transforms.Normalize(mean=(0.5, 0.5, 0.5), 
                                                 std=(0.5, 0.5, 0.5)),
                           ])
 
img = trans(img)
img = img.to(device)
# 图片扩展多一维,因为输入到保存的模型中是4维的[batch_size,通道,长,宽],而普通图片只有三维,[通道,长,宽]
img = img.unsqueeze(0)  
    # 扩展后,为[1,3,32,32]
output = model(img)
prob = F.softmax(output,dim=1) #prob是10个分类的概率
print("概率",prob)
value, predicted = torch.max(output.data, 1)
print("类别",predicted.item())
print(value)
pred_class = classes[predicted.item()]
print("分类",pred_class)
概率 tensor([[0.0743, 0.0026, 0.0117, 0.6632, 0.0266, 0.1934, 0.0139, 0.0053, 0.0061,
         0.0030]], device='cuda:0', grad_fn=<SoftmaxBackward>)
类别 3
tensor([3.5014], device='cuda:0')
分类 cat

可以看到,分类都是正确的,置信度有66.32%,而且计算速度很快,很不错,这也是MobileNet的优点。

10.总结

在这个模型中,其实我们只迭代了20次,MobileNet相对于那些臃肿的VGG或者ResNet模型来说,他的计算速度是可以达到ms级别的,非常的少,谷歌是针对嵌入式设备进行开发的,我们也可以从MobileNet这个Mobile中看得出来,在我们推理的时候,他都是非常轻量化的模型,很不错的。

在之后,谷歌又提出了MobileNetv2和v3,不仅比v1速度更快,准确率也更高,之后也会进行讲解。

对了,我们的数据和代码在我的总结中都有描述,需要的可以自己获取

这是另一个摘要:摘要[0]

文章出处登录后可见!

已经登录?立即刷新

共计人评分,平均

到目前为止还没有投票!成为第一位评论此文章。

(0)
心中带点小风骚的头像心中带点小风骚普通用户
上一篇 2022年5月9日
下一篇 2022年5月9日

相关推荐