[ 注意力机制 ] 经典网络模型3——ECANet 详解与复现


🤵 AuthorHorizon Max

编程技巧篇各种操作小结

🎇 机器视觉篇会变魔术 OpenCV

💥 深度学习篇简单入门 PyTorch

🏆 神经网络篇经典网络模型

💻 算法篇再忙也别忘了 LeetCode


🚀 Efficient Channel Attention Module

Efficient Channel Attention Module 简称 ECA,2020年 Qilong Wang等人提出的一种 高效通道注意力(ECA)模块

提出了一种 不降维的局部跨通道交互策略 ,有效避免了降维对于通道注意力学习效果的影响 ;

该模块只涉及少数几个 参数,但具有明显的 效果增益 ;

适当的 跨通道交互 可以在保持 性能 的同时 显著降低模型的复杂性

[ 注意力机制 ] 经典网络模型3——ECANet 详解与复现

🔗 论文地址:ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks


🚀 ECA-Net 详解

🎨 背景知识

深度卷积神经网络(CNN)在计算机视觉领域得到了广泛的应用,在 图像分类目标检测语义分割 等方面取得了很大的进展 ;

从具有开创性的 AlexNet 提出以来,研究人员不断的探索提升 CNN 的性能 ;
近年来,SENet 将信息通道注意力引入卷积块引起了人们的极大兴趣,显示出极大的性能改进潜力;
后来研究通过捕获更复杂的 通道依赖性 或 结合额外的 空间注意 来改进SE块 ;
但随着模型 精度 越高,复杂度 越高,计算量 也随之增大 ,计算成本 高昂 ;

研究表明,SENet采用的 降维操作 会对通道注意力的预测产生 负面影响,且获取依赖关系效率低且不必要 ;
基于此,提出了一种针对CNN的高效通道注意力(ECA)模块,避免了降维,有效地实现了 跨通道交互


Efficient Channel Attention module

Efficient Channel Attention module


特点:
(1)通过大小为 k 的快速一维卷积实现,其中核大小k表示 局部跨通道交互 的覆盖范围,即有多少领域参与了一个通道的注意预测 ;
(2)为了避免通过交叉验证手动调整 k,开发了一种 自适应方法 确定 k,其中跨通道交互的覆盖范围 (即核大小k) 与通道维度成比例 ;


🎨 论文贡献

(1)分析了SENet,并通过实证证明了 避免降维 和适当的 跨通道交互 对学习高效的通道注意力的重要性;
(2)开发了一种用于CNN的 极轻量级通道注意力模块 ,该模块对模型复杂度的增加很小,但改进明显 ;
(3)在ImageNet-1K和MS COCO上的实验结果表明,ECANet 在获得极具竞争力的性能的同时,具有 较低的模型复杂度


🎨 ECA Module

注意力模块的开发 大致可以分为两个方向:

(1)增强特征聚合;
(2)通道与空间注意的结合 ;

在这里插入图片描述

左图:Residual Module   右图:ECA-Residual Module

🚩 ECA-Net 推理过程

对于不降维的聚合特征 y ∈ RC,可以学习通道注意 :
w
W 为 C x C 的参数矩阵 ;
W
Wvar2 是一个对角矩阵,包含C个参数 ;
Wvar3 是一个完整的矩阵,包含 C×C 的参数 ;
关键的区别在于:SE-var3考虑了跨通道交互,而SE-var2没有考虑,因此SE-V ar3的性能更好 ;

ECA-Net 中,探索了另一种获取 局部跨通道交互 的方法,以保证效率和有效性,使用一个 波段矩阵Wk 来学习通道注意力:

Wk

w

其中,C1D 表示一维卷积 ;

PyTorch


总体来说:

ECA模块使用不降维的GAP聚合卷积特征后,首先自适应确定核大小k,然后进行一维卷积,再进行 Sigmoid 函数学习 channel attention



🚩 ECA-Net 应用对比

最后,分别使用 ResNetResNet+SENetResNet+CBAMResNet+ECANet 进行实验得到 模型参数量-准确率 结果 :

compare

实验表明 ECANet 性能超越了 SENetCBAM


🚀 ECA-Net 复现

这里实现的是 ECA-ResNet 系列网络 :

# Here is the code :

import torch
import torch.nn as nn
import torch.nn.functional as F
from torchinfo import summary
import math


class EfficientChannelAttention(nn.Module):           # Efficient Channel Attention module
    def __init__(self, c, b=1, gamma=2):
        super(EfficientChannelAttention, self).__init__()
        t = int(abs((math.log(c, 2) + b) / gamma))
        k = t if t % 2 else t + 1

        self.avg_pool = nn.AdaptiveAvgPool2d(1)
        self.conv1 = nn.Conv1d(1, 1, kernel_size=k, padding=int(k/2), bias=False)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        x = self.avg_pool(x)
        x = self.conv1(x.squeeze(-1).transpose(-1, -2)).transpose(-1, -2).unsqueeze(-1)
        out = self.sigmoid(x)
        return out


class BasicBlock(nn.Module):      # 左侧的 residual block 结构(18-layer、34-layer)
    expansion = 1

    def __init__(self, in_planes, planes, stride=1):      # 两层卷积 Conv2d + Shutcuts
        super(BasicBlock, self).__init__()
        self.conv1 = nn.Conv2d(in_planes, planes, kernel_size=3,
                               stride=stride, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(planes)
        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3,
                               stride=1, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(planes)

        self.channel = EfficientChannelAttention(planes)       # Efficient Channel Attention module

        self.shortcut = nn.Sequential()
        if stride != 1 or in_planes != self.expansion*planes:      # Shutcuts用于构建 Conv Block 和 Identity Block
            self.shortcut = nn.Sequential(
                nn.Conv2d(in_planes, self.expansion*planes,
                          kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(self.expansion*planes)
            )

    def forward(self, x):
        out = F.relu(self.bn1(self.conv1(x)))
        out = self.bn2(self.conv2(out))
        ECA_out = self.channel(out)
        out = out * ECA_out
        out += self.shortcut(x)
        out = F.relu(out)
        return out


class Bottleneck(nn.Module):      # 右侧的 residual block 结构(50-layer、101-layer、152-layer)
    expansion = 4

    def __init__(self, in_planes, planes, stride=1):      # 三层卷积 Conv2d + Shutcuts
        super(Bottleneck, self).__init__()
        self.conv1 = nn.Conv2d(in_planes, planes, kernel_size=1, bias=False)
        self.bn1 = nn.BatchNorm2d(planes)
        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3,
                               stride=stride, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(planes)
        self.conv3 = nn.Conv2d(planes, self.expansion*planes,
                               kernel_size=1, bias=False)
        self.bn3 = nn.BatchNorm2d(self.expansion*planes)

        self.channel = EfficientChannelAttention(self.expansion*planes)       # Efficient Channel Attention module

        self.shortcut = nn.Sequential()
        if stride != 1 or in_planes != self.expansion*planes:      # Shutcuts用于构建 Conv Block 和 Identity Block
            self.shortcut = nn.Sequential(
                nn.Conv2d(in_planes, self.expansion*planes,
                          kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(self.expansion*planes)
            )

    def forward(self, x):
        out = F.relu(self.bn1(self.conv1(x)))
        out = F.relu(self.bn2(self.conv2(out)))
        out = self.bn3(self.conv3(out))
        ECA_out = self.channel(out)
        out = out * ECA_out
        out += self.shortcut(x)
        out = F.relu(out)
        return out


class ECA_ResNet(nn.Module):
    def __init__(self, block, num_blocks, num_classes=1000):
        super(ECA_ResNet, self).__init__()
        self.in_planes = 64

        self.conv1 = nn.Conv2d(3, 64, kernel_size=3,
                               stride=1, padding=1, bias=False)                  # conv1
        self.bn1 = nn.BatchNorm2d(64)
        self.layer1 = self._make_layer(block, 64, num_blocks[0], stride=1)       # conv2_x
        self.layer2 = self._make_layer(block, 128, num_blocks[1], stride=2)      # conv3_x
        self.layer3 = self._make_layer(block, 256, num_blocks[2], stride=2)      # conv4_x
        self.layer4 = self._make_layer(block, 512, num_blocks[3], stride=2)      # conv5_x
        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.linear = nn.Linear(512 * block.expansion, num_classes)

    def _make_layer(self, block, planes, num_blocks, stride):
        strides = [stride] + [1]*(num_blocks-1)
        layers = []
        for stride in strides:
            layers.append(block(self.in_planes, planes, stride))
            self.in_planes = planes * block.expansion
        return nn.Sequential(*layers)

    def forward(self, x):
        x = F.relu(self.bn1(self.conv1(x)))
        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)
        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        out = self.linear(x)
        return out


def ECA_ResNet18():
    return ECA_ResNet(BasicBlock, [2, 2, 2, 2])


def ECA_ResNet34():
    return ECA_ResNet(BasicBlock, [3, 4, 6, 3])


def ECA_ResNet50():
    return ECA_ResNet(Bottleneck, [3, 4, 6, 3])


def ECA_ResNet101():
    return ECA_ResNet(Bottleneck, [3, 4, 23, 3])


def ECA_ResNet152():
    return ECA_ResNet(Bottleneck, [3, 8, 36, 3])


def test():
    net = ECA_ResNet50()
    y = net(torch.randn(1, 3, 224, 224))
    print(y.size())
    summary(net, (1, 3, 224, 224))


if __name__ == '__main__':
    test()

输出结果:

torch.Size([1, 1000])
====================================================================================================
Layer (type:depth-idx)                             Output Shape              Param #
====================================================================================================
ECA_ResNet                                         --                        --
├─Conv2d: 1-1                                      [1, 64, 224, 224]         1,728
├─BatchNorm2d: 1-2                                 [1, 64, 224, 224]         128
├─Sequential: 1-3                                  [1, 256, 224, 224]        --
│    └─Bottleneck: 2-1                             [1, 256, 224, 224]        --
│    │    └─Conv2d: 3-1                            [1, 64, 224, 224]         4,096
│    │    └─BatchNorm2d: 3-2                       [1, 64, 224, 224]         128
│    │    └─Conv2d: 3-3                            [1, 64, 224, 224]         36,864
│    │    └─BatchNorm2d: 3-4                       [1, 64, 224, 224]         128
│    │    └─Conv2d: 3-5                            [1, 256, 224, 224]        16,384
│    │    └─BatchNorm2d: 3-6                       [1, 256, 224, 224]        512
│    │    └─EfficientChannelAttention: 3-7         [1, 256, 1, 1]            5
│    │    └─Sequential: 3-8                        [1, 256, 224, 224]        16,896
│    └─Bottleneck: 2-2                             [1, 256, 224, 224]        --
│    │    └─Conv2d: 3-9                            [1, 64, 224, 224]         16,384
│    │    └─BatchNorm2d: 3-10                      [1, 64, 224, 224]         128
│    │    └─Conv2d: 3-11                           [1, 64, 224, 224]         36,864
│    │    └─BatchNorm2d: 3-12                      [1, 64, 224, 224]         128
│    │    └─Conv2d: 3-13                           [1, 256, 224, 224]        16,384
│    │    └─BatchNorm2d: 3-14                      [1, 256, 224, 224]        512
│    │    └─EfficientChannelAttention: 3-15        [1, 256, 1, 1]            5
│    │    └─Sequential: 3-16                       [1, 256, 224, 224]        --
│    └─Bottleneck: 2-3                             [1, 256, 224, 224]        --
│    │    └─Conv2d: 3-17                           [1, 64, 224, 224]         16,384
│    │    └─BatchNorm2d: 3-18                      [1, 64, 224, 224]         128
│    │    └─Conv2d: 3-19                           [1, 64, 224, 224]         36,864
│    │    └─BatchNorm2d: 3-20                      [1, 64, 224, 224]         128
│    │    └─Conv2d: 3-21                           [1, 256, 224, 224]        16,384
│    │    └─BatchNorm2d: 3-22                      [1, 256, 224, 224]        512
│    │    └─EfficientChannelAttention: 3-23        [1, 256, 1, 1]            5
│    │    └─Sequential: 3-24                       [1, 256, 224, 224]        --
├─Sequential: 1-4                                  [1, 512, 112, 112]        --
│    └─Bottleneck: 2-4                             [1, 512, 112, 112]        --
│    │    └─Conv2d: 3-25                           [1, 128, 224, 224]        32,768
│    │    └─BatchNorm2d: 3-26                      [1, 128, 224, 224]        256
│    │    └─Conv2d: 3-27                           [1, 128, 112, 112]        147,456
│    │    └─BatchNorm2d: 3-28                      [1, 128, 112, 112]        256
│    │    └─Conv2d: 3-29                           [1, 512, 112, 112]        65,536
│    │    └─BatchNorm2d: 3-30                      [1, 512, 112, 112]        1,024
│    │    └─EfficientChannelAttention: 3-31        [1, 512, 1, 1]            5
│    │    └─Sequential: 3-32                       [1, 512, 112, 112]        132,096
│    └─Bottleneck: 2-5                             [1, 512, 112, 112]        --
│    │    └─Conv2d: 3-33                           [1, 128, 112, 112]        65,536
│    │    └─BatchNorm2d: 3-34                      [1, 128, 112, 112]        256
│    │    └─Conv2d: 3-35                           [1, 128, 112, 112]        147,456
│    │    └─BatchNorm2d: 3-36                      [1, 128, 112, 112]        256
│    │    └─Conv2d: 3-37                           [1, 512, 112, 112]        65,536
│    │    └─BatchNorm2d: 3-38                      [1, 512, 112, 112]        1,024
│    │    └─EfficientChannelAttention: 3-39        [1, 512, 1, 1]            5
│    │    └─Sequential: 3-40                       [1, 512, 112, 112]        --
│    └─Bottleneck: 2-6                             [1, 512, 112, 112]        --
│    │    └─Conv2d: 3-41                           [1, 128, 112, 112]        65,536
│    │    └─BatchNorm2d: 3-42                      [1, 128, 112, 112]        256
│    │    └─Conv2d: 3-43                           [1, 128, 112, 112]        147,456
│    │    └─BatchNorm2d: 3-44                      [1, 128, 112, 112]        256
│    │    └─Conv2d: 3-45                           [1, 512, 112, 112]        65,536
│    │    └─BatchNorm2d: 3-46                      [1, 512, 112, 112]        1,024
│    │    └─EfficientChannelAttention: 3-47        [1, 512, 1, 1]            5
│    │    └─Sequential: 3-48                       [1, 512, 112, 112]        --
│    └─Bottleneck: 2-7                             [1, 512, 112, 112]        --
│    │    └─Conv2d: 3-49                           [1, 128, 112, 112]        65,536
│    │    └─BatchNorm2d: 3-50                      [1, 128, 112, 112]        256
│    │    └─Conv2d: 3-51                           [1, 128, 112, 112]        147,456
│    │    └─BatchNorm2d: 3-52                      [1, 128, 112, 112]        256
│    │    └─Conv2d: 3-53                           [1, 512, 112, 112]        65,536
│    │    └─BatchNorm2d: 3-54                      [1, 512, 112, 112]        1,024
│    │    └─EfficientChannelAttention: 3-55        [1, 512, 1, 1]            5
│    │    └─Sequential: 3-56                       [1, 512, 112, 112]        --
├─Sequential: 1-5                                  [1, 1024, 56, 56]         --
│    └─Bottleneck: 2-8                             [1, 1024, 56, 56]         --
│    │    └─Conv2d: 3-57                           [1, 256, 112, 112]        131,072
│    │    └─BatchNorm2d: 3-58                      [1, 256, 112, 112]        512
│    │    └─Conv2d: 3-59                           [1, 256, 56, 56]          589,824
│    │    └─BatchNorm2d: 3-60                      [1, 256, 56, 56]          512
│    │    └─Conv2d: 3-61                           [1, 1024, 56, 56]         262,144
│    │    └─BatchNorm2d: 3-62                      [1, 1024, 56, 56]         2,048
│    │    └─EfficientChannelAttention: 3-63        [1, 1024, 1, 1]           5
│    │    └─Sequential: 3-64                       [1, 1024, 56, 56]         526,336
│    └─Bottleneck: 2-9                             [1, 1024, 56, 56]         --
│    │    └─Conv2d: 3-65                           [1, 256, 56, 56]          262,144
│    │    └─BatchNorm2d: 3-66                      [1, 256, 56, 56]          512
│    │    └─Conv2d: 3-67                           [1, 256, 56, 56]          589,824
│    │    └─BatchNorm2d: 3-68                      [1, 256, 56, 56]          512
│    │    └─Conv2d: 3-69                           [1, 1024, 56, 56]         262,144
│    │    └─BatchNorm2d: 3-70                      [1, 1024, 56, 56]         2,048
│    │    └─EfficientChannelAttention: 3-71        [1, 1024, 1, 1]           5
│    │    └─Sequential: 3-72                       [1, 1024, 56, 56]         --
│    └─Bottleneck: 2-10                            [1, 1024, 56, 56]         --
│    │    └─Conv2d: 3-73                           [1, 256, 56, 56]          262,144
│    │    └─BatchNorm2d: 3-74                      [1, 256, 56, 56]          512
│    │    └─Conv2d: 3-75                           [1, 256, 56, 56]          589,824
│    │    └─BatchNorm2d: 3-76                      [1, 256, 56, 56]          512
│    │    └─Conv2d: 3-77                           [1, 1024, 56, 56]         262,144
│    │    └─BatchNorm2d: 3-78                      [1, 1024, 56, 56]         2,048
│    │    └─EfficientChannelAttention: 3-79        [1, 1024, 1, 1]           5
│    │    └─Sequential: 3-80                       [1, 1024, 56, 56]         --
│    └─Bottleneck: 2-11                            [1, 1024, 56, 56]         --
│    │    └─Conv2d: 3-81                           [1, 256, 56, 56]          262,144
│    │    └─BatchNorm2d: 3-82                      [1, 256, 56, 56]          512
│    │    └─Conv2d: 3-83                           [1, 256, 56, 56]          589,824
│    │    └─BatchNorm2d: 3-84                      [1, 256, 56, 56]          512
│    │    └─Conv2d: 3-85                           [1, 1024, 56, 56]         262,144
│    │    └─BatchNorm2d: 3-86                      [1, 1024, 56, 56]         2,048
│    │    └─EfficientChannelAttention: 3-87        [1, 1024, 1, 1]           5
│    │    └─Sequential: 3-88                       [1, 1024, 56, 56]         --
│    └─Bottleneck: 2-12                            [1, 1024, 56, 56]         --
│    │    └─Conv2d: 3-89                           [1, 256, 56, 56]          262,144
│    │    └─BatchNorm2d: 3-90                      [1, 256, 56, 56]          512
│    │    └─Conv2d: 3-91                           [1, 256, 56, 56]          589,824
│    │    └─BatchNorm2d: 3-92                      [1, 256, 56, 56]          512
│    │    └─Conv2d: 3-93                           [1, 1024, 56, 56]         262,144
│    │    └─BatchNorm2d: 3-94                      [1, 1024, 56, 56]         2,048
│    │    └─EfficientChannelAttention: 3-95        [1, 1024, 1, 1]           5
│    │    └─Sequential: 3-96                       [1, 1024, 56, 56]         --
│    └─Bottleneck: 2-13                            [1, 1024, 56, 56]         --
│    │    └─Conv2d: 3-97                           [1, 256, 56, 56]          262,144
│    │    └─BatchNorm2d: 3-98                      [1, 256, 56, 56]          512
│    │    └─Conv2d: 3-99                           [1, 256, 56, 56]          589,824
│    │    └─BatchNorm2d: 3-100                     [1, 256, 56, 56]          512
│    │    └─Conv2d: 3-101                          [1, 1024, 56, 56]         262,144
│    │    └─BatchNorm2d: 3-102                     [1, 1024, 56, 56]         2,048
│    │    └─EfficientChannelAttention: 3-103       [1, 1024, 1, 1]           5
│    │    └─Sequential: 3-104                      [1, 1024, 56, 56]         --
├─Sequential: 1-6                                  [1, 2048, 28, 28]         --
│    └─Bottleneck: 2-14                            [1, 2048, 28, 28]         --
│    │    └─Conv2d: 3-105                          [1, 512, 56, 56]          524,288
│    │    └─BatchNorm2d: 3-106                     [1, 512, 56, 56]          1,024
│    │    └─Conv2d: 3-107                          [1, 512, 28, 28]          2,359,296
│    │    └─BatchNorm2d: 3-108                     [1, 512, 28, 28]          1,024
│    │    └─Conv2d: 3-109                          [1, 2048, 28, 28]         1,048,576
│    │    └─BatchNorm2d: 3-110                     [1, 2048, 28, 28]         4,096
│    │    └─EfficientChannelAttention: 3-111       [1, 2048, 1, 1]           7
│    │    └─Sequential: 3-112                      [1, 2048, 28, 28]         2,101,248
│    └─Bottleneck: 2-15                            [1, 2048, 28, 28]         --
│    │    └─Conv2d: 3-113                          [1, 512, 28, 28]          1,048,576
│    │    └─BatchNorm2d: 3-114                     [1, 512, 28, 28]          1,024
│    │    └─Conv2d: 3-115                          [1, 512, 28, 28]          2,359,296
│    │    └─BatchNorm2d: 3-116                     [1, 512, 28, 28]          1,024
│    │    └─Conv2d: 3-117                          [1, 2048, 28, 28]         1,048,576
│    │    └─BatchNorm2d: 3-118                     [1, 2048, 28, 28]         4,096
│    │    └─EfficientChannelAttention: 3-119       [1, 2048, 1, 1]           7
│    │    └─Sequential: 3-120                      [1, 2048, 28, 28]         --
│    └─Bottleneck: 2-16                            [1, 2048, 28, 28]         --
│    │    └─Conv2d: 3-121                          [1, 512, 28, 28]          1,048,576
│    │    └─BatchNorm2d: 3-122                     [1, 512, 28, 28]          1,024
│    │    └─Conv2d: 3-123                          [1, 512, 28, 28]          2,359,296
│    │    └─BatchNorm2d: 3-124                     [1, 512, 28, 28]          1,024
│    │    └─Conv2d: 3-125                          [1, 2048, 28, 28]         1,048,576
│    │    └─BatchNorm2d: 3-126                     [1, 2048, 28, 28]         4,096
│    │    └─EfficientChannelAttention: 3-127       [1, 2048, 1, 1]           7
│    │    └─Sequential: 3-128                      [1, 2048, 28, 28]         --
├─AdaptiveAvgPool2d: 1-7                           [1, 2048, 1, 1]           --
├─Linear: 1-8                                      [1, 1000]                 2,049,000
====================================================================================================
Total params: 25,549,438
Trainable params: 25,549,438
Non-trainable params: 0
Total mult-adds (G): 63.59
====================================================================================================
Input size (MB): 0.60
Forward/backward pass size (MB): 2691.17
Params size (MB): 102.20
Estimated Total Size (MB): 2793.97
====================================================================================================


文章出处登录后可见!

已经登录?立即刷新

共计人评分,平均

到目前为止还没有投票!成为第一位评论此文章。

(0)
扎眼的阳光的头像扎眼的阳光普通用户
上一篇 2023年2月21日 上午7:56
下一篇 2023年2月21日 上午7:56

相关推荐