各位同学好，今天和大家分享一下如何使用 TensorFlow 构建目标检测算法中的目标边界框定位损失函数iou、Giou、Diou、Ciou

1. iou 损失

1.1 方法介绍

iou又称为交并比，是指预测框和真实框的交集和并集的比值。在目标检测中，iou 能满足对称性、同一性、非负性、三角不等性，并且相比于 L1、L2 损失函数具有尺度不变性。无论边界框的尺度大小，输出的iou损失总是在0到1之间，因此能够比较好的反映预测框和真实框之间的检测效果。

iou 的值越大，表明两个框重叠度越高。当iou为0时，说明两个框完全没有重合，iou为1时说明两个框完全重合。对于iou，我们通常会选取一个阈值，来确定预测框是正确的还是错误的。如两个框的iou大于0.5时，认为这是一个正确的框，包含物体，否则没有包含物体，是一个错误的框。

iou 损失是先求出预测框和真实框之间的交集和并集之比，再求负对数，但在实际应用的过程中经常将 iou 损失写成 1-iou，公示如下，预测框区域A，真实框区域B。

$\large \large IOU = \frac{A\bigcap B}{A\bigcup B}\;\;\;\;\;\;\;L_{IOU}=1-IOU$

1.2 代码展示

box1 代表输入的预测框信息， shape=[b, w, h, num_anchor, 4] ，其中4代表每个检测框的中心点坐标(x,y)，和预测框宽高(w,h)。box2 代表真实框信息，包含的内容和预测框相同。

输出iou的shape为[b, w, h, num_anchor]，代表每张图片的每个检测框的iou

IOU 代码如下：

import tensorflow as tf

#（1）定义iou损失
def IOU(box1, box2):

    # 接收预测框的坐标信息
    box1_xy = box1[..., :2]   # 处理所有batch所有图片的检测框，中心坐标
    box1_wh = box1[..., 2:4]  # 所有图片的宽高
    box1_wh_half = box1_wh // 2  # 一半的宽高
    box1_min = box1_xy - box1_wh_half  # 左上角坐标
    box1_max = box1_xy + box1_wh_half  # 右下角坐标

    # 接收真实框的左上和右下坐标, 方法和上面一样
    box2_xy = box2[..., :2] 
    box2_wh = box2[..., 2:4]
    box2_wh_half = box2_wh // 2
    box2_min = box2_xy - box2_wh_half 
    box2_max = box2_xy + box2_wh_half  

    # 预测框的面积
    box1_area = box1_wh[..., 0] * box1_wh[..., 1]
    # 真实框的面积
    box2_area = box2_wh[..., 0] * box2_wh[..., 1]

    # 找出交集区域的xy坐标
    intersect_min = tf.maximum(box1_min, box2_min)  # 交集的左上角坐标
    intersect_max = tf.minimum(box1_max, box2_max)  # 交集的右下角坐标
    # 所有图片的交集区域的宽和高，如果两个框分离，宽高就是0
    intersect_wh = tf.maximum(intersect_max - intersect_min, 0)

    # 计算交集区域面积
    intersect_area = intersect_wh[..., 0] * intersect_wh[..., 1]
    # 计算并集区域面积
    union_area = box1_area + box2_area - intersect_area

    # 计算交并比，分母加上一个很小的数防止为0
    iou = intersect_area / (union_area + tf.keras.backend.epsilon())

    return iou

#（2）验证
if __name__ == '__main__':
    # 随机初始化
    box1 = tf.fill([32, 16, 16, 3, 4], 50.0)
    box2 = tf.fill([32, 16, 16, 3, 4], 40.0)
    # 接收iou
    iou = IOU(box1, box2)
    print(iou.shape)  # [32,16,16,3]
    print(iou[0, 0, 0])  # 查看某一张图片的三个先验框的iou

    '''
    tf.Tensor([0.42608696 0.42608696 0.42608696], shape=(3,), dtype=float32)
    '''

2. Giou 损失

2.1 方法介绍

使用 iou 衡量两个检测框之间的重合程度时，当两个检测框不相交时，iou 等于0，不能反映两个框的距离的大小，不能进行梯度回传，因此无法进行学习训练。

Giou 可以理解为，对于预测矩形框A 和真实矩形框 B，计算能够同时包含A和B的最小的封闭矩形区域C。将封闭区域C 的面积减去 A和B的并集面积，再除以C的面积，得到一个比值。Giou就等于iou减去这个比值，公式如下：

$\large GIOU = IOU - \frac{C-(A\bigcup B)}{C}\;\;\;\;\;\;\;L_{GIOU}=1-GIOU$

当两个边界框完美重合在一起时，IOU=1，C=AUB，此时GIOU=1；当两个边界框完全分离时，IOU=0，AUB=0，此时GIOU=-1。因此GIOU的值是在 [-1, 1] 之间。那么Loss_GIOU就是在 [-2, 0] 之间

Giou损失的特点：

（1）GIOU损失能够衡量两个边界框的距离

（2）GIOU不受目标对象大小的限制，具有很好的泛化能力。

（3）GIOU引入了包含预测框A和真实框B的最小封闭区C，所以即使A和B不相交时，依然可以对检测框优化。

（4）与IOU相比，GIOU不仅仅关注两个边界框有重叠的区域，最小封闭矩形C和两个矩形区域A和B之间的空隙（上图白色区域），在A和B没有很好的对齐时会增大。因此GIOU的值不仅能反映边界框A和B是否有重叠区域，而且还能反映两个边界框是如何重叠的。

2.2 代码展示

'''
参数
box1: 输入的预测框信息, [b, w, h, num_anchor, 4], 其中4代表该框的中心坐标xy和宽高wh
box2: 输入的真实框信息, [b, w, h, num_anchor, 4], 其中4代表该框的中心坐标xy和宽高wh
返回值
iou: 输出的IOU损失, [b, w, h, num_anchor, 1], 其中1代表Giou值
'''
import tensorflow as tf

#（1）定义GIOU损失
def GIOU(box1, box2):

    # 接收预测框的坐标信息
    box1_xy = box1[..., 0:2]  # 接收所有预测框的中心点xy
    box1_wh = box1[..., 2:4]  # 接收所有预测框的宽高wh
    box1_wh_half = box1_wh // 2  # 取一半的宽高
    box1_min = box1_xy - box1_wh_half  # 预测框的左上角坐标
    box1_max = box1_xy + box1_wh_half  # 预测框的右下角坐标
    # 预测框的面积w*h
    box1_area = box1_wh[..., 0] * box1_wh[..., 1]

    # 接收真实框的坐标信息
    box2_xy = box2[..., 0:2]  # 接收所有真实框的中心点坐标
    box2_wh = box2[..., 2:4]  # 接收所有真实框的宽高
    box2_wh_half = box2_wh // 2  # 取宽高的一半
    box2_min = box2_xy - box2_wh_half  # 真实框的左上角坐标
    box2_max = box2_xy + box2_wh_half  # 真实框的右下角坐标
    # 真实框面积w * h
    box2_area = box2_wh[..., 0] * box2_wh[..., 1]

    # 两个框的交集的
    intersect_min = tf.maximum(box1_min, box2_min)  # 交集的左上角坐标
    intersect_max = tf.minimum(box1_max, box2_max)  # 交集的右下角坐标
    # 交集的宽高
    intersect_wh = intersect_max - intersect_min
    
    # 交集的面积iw*ih
    intersect_area = intersect_wh[..., 0] * intersect_wh[..., 1]
    # 并集的面积
    union_area = box1_area + box2_area - intersect_area
    # 计算iou
    iou = intersect_area / (union_area + tf.keras.backend.epsilon())

    # 计算可以包含预测框和真实框的最小封闭矩形框
    enclose_min = tf.minimum(box1_min, box2_min)
    enclose_max = tf.maximum(box1_max, box2_max)
    # 最小矩形框的宽高
    enclose_wh = enclose_max - enclose_min
    # 闭环矩形的面积ew*eh
    enclose_area = enclose_wh[..., 0] * enclose_wh[..., 1]

    # 计算Giou
    giou = iou - (enclose_area - union_area) / (enclose_area + tf.keras.backend.epsilon())
    
    return iou, giou

#（2）验证
if __name__ == '__main__':
    # 随机初始化
    box1 = tf.fill([32, 16, 16, 3, 4], 50.0)
    box2 = tf.fill([32, 16, 16, 3, 4], 40.0)
    # 接收iou和giou
    iou, giou = GIOU(box1, box2)

    print('iou_shape:', iou.shape)  # [32,16,16,3]
    print(iou[0, 0, 0])  # 查看某一张图片的三个先验框的iou
    '''
    tf.Tensor([0.42608696 0.42608696 0.42608696], shape=(3,), dtype=float32)
    '''

    print('giou_shape:', giou.shape)  # [32,16,16,3]
    print(giou[0,0,0])  

    '''
    tf.Tensor([0.3765002 0.3765002 0.3765002], shape=(3,), dtype=float32)
    '''

3. Diou 损失

3.1 方法介绍

GIOU 损失首先是要增加预测框的大小，使其能够与目标框重叠，然后与等式中的 IOU 项重叠。另外，当 GIOU 面对两个边界框是水平或者竖直的，对于GIOU 的参数更新和优化就会变得很缓慢。

DIOU 在IOU的基础上加入了中心点归一化，将预测框和真实框之间的距离、重叠率、尺度都考虑了进去，能够直接最小化两个检测框之间的距离，使得目标边界框回归变得更加稳定，收敛速度更快。

b 代表预测框的中心点坐标，b_gt 代表真实框的中心点坐标， $\large \rho$ 代表两个中心点之间的欧式距离，c 代表两个目标边界框外接矩形的对角线的长度。公式如下

$\large DIOU=IOU-\frac{\rho ^{2}(b, b^{gt})}{c^{2}} = IOU-\frac{d^{2}}{c^{2}} \;\;\;\;\;L_{DIOU}=1-DIOU$

当两个边界框完美重合在一起时，距离d=0，IOU=1，此时DIOU=1。当两个边界框完全分离时，IOU=0，距离d^2 和 c^2 的比值等于1，此时DIOU=-1。因此，DIOU的值域是 [-1, 1]，Loss_DIOU 值域是[0, 2]

3.2 代码展示

'''
参数
box1: 输入的预测框信息, [b, w, h, num_anchor, 4], 其中4代表该框的中心坐标xy和宽高wh
box2: 输入的真实框信息, [b, w, h, num_anchor, 4], 其中4代表该框的中心坐标xy和宽高wh
返回值
iou: 输出的IOU损失, [b, w, h, num_anchor, 1], 其中1代表Diou值
'''
import tensorflow as tf

#（1）定义Diou计算方法
def DIOU(box1, box2):

    # ① 先计算iou
    # 接收预测框的坐标信息
    box1_xy = box1[..., 0:2]  # 预测框的中心坐标
    box1_wh = box1[..., 2:4]  # 预测框的宽高
    box1_wh_half = box1_wh // 2  # 一半的预测框的宽高
    box1_min = box1_xy - box1_wh_half  # 预测框的左上角坐标
    box1_max = box1_xy + box1_wh_half  # 预测框的右下角坐标
    # 预测框的面积
    box1_area = box1_wh[..., 0] * box1_wh[..., 1]

    # 接收真实框的坐标信息
    box2_xy = box2[..., 0:2]  # 真实框的中心坐标
    box2_wh = box2[..., 2:4]  # 真实框的宽高
    box2_wh_half = box2_wh // 2  # 一半的宽高
    box2_min = box2_xy - box2_wh_half  # 真实框的左上角坐标
    box2_max = box2_xy + box2_wh_half  # 真实框的右下角坐标
    # 真实框的面积
    box2_area = box2_wh[..., 0] * box2_wh[..., 1]

    # 交集的左上角和右下角坐标
    intersect_min = tf.maximum(box1_min, box2_min)
    intersect_max = tf.minimum(box1_max, box2_max)
    # 交集的宽高
    intersect_wh = intersect_max - intersect_min
    # 交集的面积
    intersect_area = intersect_wh[..., 0] * intersect_wh[..., 1]

    # 并集的面积
    union_area = box1_area + box2_area - intersect_area
    # 计算iou，分母加上很小的数防止为0
    iou = intersect_area / (union_area + tf.keras.backend.epsilon())

    # ② 求出包含两个框的最小封闭矩形
    enclose_min = tf.minimum(box1_min, box2_min)  # 左上坐标
    enclose_max = tf.maximum(box1_max, box2_max)  # 右下坐标
    enclose_wh = enclose_max - enclose_min  # 封闭矩形的宽高

    # 计算对角线距离 w**2 + h**2
    enclose_distance = tf.square(enclose_wh[..., 0]) + tf.square(enclose_wh[..., 1])

    # ③ 计算两个框中心点之间的距离，计算方法同上
    center_distance = tf.reduce_sum(tf.square(box1_xy - box2_xy), axis=-1)

    # ④ 计算diou
    diou = iou - (center_distance / enclose_distance)

    # 返回每个检测框的iou和diou
    return iou, diou

#（2）验证
if __name__ == '__main__':
    # 随机初始化
    box1 = tf.fill([32, 16, 16, 3, 4], 50.0)
    box2 = tf.fill([32, 16, 16, 3, 4], 40.0)
    # 接收iou和giou
    iou, diou = DIOU(box1, box2)

    print('iou_shape:', iou.shape)  # [32,16,16,3]
    print(iou[0, 0, 0])  # 查看某一张图片的三个先验框的iou
    '''
    tf.Tensor([0.42608696 0.42608696 0.42608696], shape=(3,), dtype=float32)
    '''

    print('giou_shape:', diou.shape)  # [32,16,16,3]
    print(diou[0,0,0])
    '''
    tf.Tensor([0.39302912 0.39302912 0.39302912], shape=(3,), dtype=float32)
    '''

4. Ciou 损失

4.1 方法介绍

CIOU是在DIOU的基础上发展而来的。作者指出，作为一个优秀的回归定位损失应该考虑到三种几何参数：重叠面积、中心点距离、长宽比。DIOU损失关注了边界框的相交面积和位置坐标点距离，但忽略了边界框长宽比的统一性也是非常重要的衡量标准，因此CIOU引入了长宽比。

DIOU损失当两个边界框不相交时，可以用适当的梯度进行参数更新从而进一步缩小两个检测框的差异。同时尽可能缩小两个框的位置坐标距离，以少量的计算代价在精确度上进一步得到提升。CIOU损失还关注了边界框长宽比的统一性，因此有更快的收敛速度和更好的性能。

纵横比公式如下，其中 $\large \alpha$ 代表权衡因子， $\large v$ 用于评估纵横比的均匀性。

$\large \alpha = \frac{v}{1-IOU+v} \;\;\;\;\;\;\;\;\;\; v = \frac{4}{\pi ^{2}}(arctan\frac{w^{gt}}{h^{gt}} - arctan\frac{w}{h})^{2}$

CIOU损失计算公式如下，

$\large CIOU=IOU-(\frac{\rho ^{2}(b, b^{gt})}{c^{2}} + \alpha v) \;\;\;\;\;\;\;L_{CIOU}=1-CIOU$

4.2 代码展示

'''
参数
box1: 输入的预测框信息, [b, w, h, num_anchor, 4], 其中4代表该框的中心坐标xy和宽高wh
box2: 输入的真实框信息, [b, w, h, num_anchor, 4], 其中4代表该框的中心坐标xy和宽高wh
返回值
iou: 输出的IOU损失, [b, w, h, num_anchor, 1], 其中1代表Ciou值
'''

import tensorflow as tf
import math

#（1）定义CIOU计算方法
def CIOU(box1, box2):

    # ① 先计算iou
    # 接收预测框的坐标信息
    box1_xy = box1[..., 0:2]  # 预测框的中心坐标
    box1_wh = box1[..., 2:4]  # 预测框的宽高
    box1_wh_half = box1_wh // 2  # 一半的预测框的宽高
    box1_min = box1_xy - box1_wh_half  # 预测框的左上角坐标
    box1_max = box1_xy + box1_wh_half  # 预测框的右下角坐标
    # 预测框的面积
    box1_area = box1_wh[..., 0] * box1_wh[..., 1]

    # 接收真实框的坐标信息
    box2_xy = box2[..., 0:2]  # 真实框的中心坐标
    box2_wh = box2[..., 2:4]  # 真实框的宽高
    box2_wh_half = box2_wh // 2  # 一半的宽高
    box2_min = box2_xy - box2_wh_half  # 真实框的左上角坐标
    box2_max = box2_xy + box2_wh_half  # 真实框的右下角坐标
    # 真实框的面积
    box2_area = box2_wh[..., 0] * box2_wh[..., 1]

    # 交集的左上角和右下角坐标
    intersect_min = tf.maximum(box1_min, box2_min)
    intersect_max = tf.minimum(box1_max, box2_max)
    # 交集的宽高
    intersect_wh = intersect_max - intersect_min
    # 交集的面积
    intersect_area = intersect_wh[..., 0] * intersect_wh[..., 1]

    # 并集的面积
    union_area = box1_area + box2_area - intersect_area
    # 计算iou，分母加上很小的数防止为0
    iou = intersect_area / (union_area + tf.keras.backend.epsilon())

    # ② 求出包含两个框的最小封闭矩形
    enclose_min = tf.minimum(box1_min, box2_min)  # 左上坐标
    enclose_max = tf.maximum(box1_max, box2_max)  # 右下坐标

    # 计算对角线距离
    enclose_distance = tf.reduce_sum(tf.square(enclose_max - enclose_min), axis=-1)

    # 计算两个框中心点之间的距离，计算方法同上
    center_distance = tf.reduce_sum(tf.square(box1_xy - box2_xy), axis=-1)
    
    # ③ 考虑长宽比
    # tf.math.atan2()返回[-pi, pi]之间的角度
    v = 4 * tf.square(tf.math.atan2(box1_wh[..., 0], box1_wh[..., 1]) - tf.math.atan2(box2_wh[..., 0], box2_wh[..., 1])) / (math.pi * math.pi)
    alpha = v / (1.0 - iou + v)

    # 计算ciou
    ciou = iou - center_distance / enclose_distance - alpha * v

    return iou, ciou

#（2）验证
if __name__ == '__main__':
    # 随机初始化
    box1 = tf.fill([32, 16, 16, 3, 4], 50.0)
    box2 = tf.fill([32, 16, 16, 3, 4], 40.0)
    # 接收iou和giou
    iou, ciou = CIOU(box1, box2)

    print('iou_shape:', iou.shape)  # [32,16,16,3]
    print(iou[0, 0, 0])  # 查看某一张图片的三个先验框的iou
    '''
    tf.Tensor([0.42608696 0.42608696 0.42608696], shape=(3,), dtype=float32)
    '''

    print('giou_shape:', ciou.shape)  # [32,16,16,3]
    print(ciou[0,0,0])
    ''' 
    tf.Tensor([0.39302912 0.39302912 0.39302912], shape=(3,), dtype=float32)
    '''

文章出处登录后可见！

已经登录？立即刷新

【目标检测】(11) 预测框定位损失 iou、Giou、Diou、Ciou，附TensorFlow完整代码