使用DiffusionDet训练自己的数据集（pascal-voc）

文章目录

1.数据格式
- ①目录格式
- ②Annotations：xml文件
- ③ImageSets/Main/下：train.txt和val.txt
- ④JPEGImages下：图片
2.下载预训练模型
3.修改代码
- ①修改配置文件diffdet.coco.res50.yaml
- ②修改配置文件Base-DiffusionDet.yaml
- ③修改detectron2里的pascal_voc.py
4.训练成功
5.其他可能出现的问题
- 问题①：CUDA out of memory
- 问题②：tuple.index(x): x not in tuple
6.评估
- ①改动DiffusionDet-main/train_net.py
- ②如果你只有一类，务必改动detectron2/data/datasets/pascal_voc.py
- 问题①：’NoneType’ object has no attribute ‘text’
- 问题②：ValueError: invalid literal for int() with base 10: ‘1260.29’
- 问题③：module ‘numpy’ has no attribute ‘bool’.

此贴建立在DiffusionDet和detectron2环境已经配置好（能跑通DiffusionDet的demo.py就行）
我没有跟着官方手顺建立软链接什么的，比较麻烦，我直接按照自己的习惯建的目录

DiffusionDet代码链接：https://github.com/ShoufaChen/DiffusionDet
detectron2代码链接：https://github.com/facebookresearch/detectron2

1.数据格式

①目录格式

②Annotations：xml文件

③ImageSets/Main/下：train.txt和val.txt

train.txt:

val.txt:

④JPEGImages下：图片

2.下载预训练模型

https://github.com/ShoufaChen/DiffusionDet

我下的COCO-Res50，放在了DiffusionDet-main/model/（自己建的）下了

3.修改代码

这部分巨麻烦，我很服detectron2，(解决Fix for numpy deprecation of np.strhttps://github.com/facebookresearch/detectron2/pull/4806这个问题)。
如果有什么其余很好的方法，我后续再改吧。

①修改配置文件diffdet.coco.res50.yaml

DiffusionDet-main/configs/下的diffdet.coco.res50.yaml

红框为必改
WEIGHTS参数我写的全路径（不犯错）
NUM_CLASSES自己按照自己的数据集类别去改
line11和line12的voc_2007_train和voc_2007_val不要动，他代表数据格式，不是你的文件名字

②修改配置文件Base-DiffusionDet.yaml

后面爆显存时修改这个batch，调小。

③修改detectron2里的pascal_voc.py

这一步是由于按照上面修改完配置文件后，其实按理来说应该可以跑起来了，但是运行python train_net.py --config-file configs/diffdet.coco.res50.yaml时出现了问题。

  File "/opt/conda/lib/python3.8/site-packages/detectron2/data/datasets/pascal_voc.py", line 35, in load_voc_instances
    fileids = np.loadtxt(f, dtype=np.str)
  File "/opt/conda/lib/python3.8/site-packages/numpy/__init__.py", line 305, in __getattr__
    raise AttributeError(__former_attrs__[attr])
AttributeError: module 'numpy' has no attribute 'str'.
`np.str` was a deprecated alias for the builtin `str`. To avoid this error in existing code, use `str` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.str_` here.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
    https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations

先说全部的解决方案，
第一种：修改numpy版本为1.20.1（推荐）
第二种：暴力改代码：需要修改/opt/conda/lib/python3.8/site-packages/detectron2/data/datasets/pascal_voc.py文件，再说分步的：
打开/opt/conda/lib/python3.8/site-packages/detectron2/data/datasets/pascal_voc.py文件：

vim /opt/conda/lib/python3.8/site-packages/detectron2/data/datasets/pascal_voc.py

需要修改四个地方。

修改完运行python train_net.py --config-file configs/diffdet.coco.res50.yaml就可以了。

详细解决步骤：
我看Detectron2的GitHub的issue里也有这个问题：
https://github.com/facebookresearch/detectron2/pull/4806

刚开始我以为我当时装detectron2时，应该新拉代码再编译，后来发现不是这样的，他新代码也这样，具体为下：
但是我没看到具体的解决办法，我看最新版的代码的detectron2/detectron2/data/datasets/pascal_voc.py里，也是存在这个问题的。

修改办法：
1.尝试过修改numpy版本，后来觉得还是算了，直接改detectron2代码把。
2.既然是在/opt/conda/lib/python3.8/site-packages/detectron2/data/datasets/pascal_voc.py出现的问题，那就去修改这个文件。
运行以下命令：

vim /opt/conda/lib/python3.8/site-packages/detectron2/data/datasets/pascal_voc.py

原版不复制在这里了，我直接标注我改了哪里：
（我的修改方法比较暴力，遇到问题直接修改源码那种）

后来出现新的问题：

    DatasetCatalog.register(name, lambda: load_voc_instances(dirname, split, class_names))
  File "/opt/conda/lib/python3.8/site-packages/detectron2/data/datasets/pascal_voc.py", line 41, in load_voc_instances
    anno_file = os.path.join(annotation_dirname, fileid + ".xml")
numpy.core._exceptions.UFuncTypeError: ufunc 'add' did not contain a loop with signature matching types (dtype('<U13'), dtype('<U4')) -> None

解决办法：
还是修改/opt/conda/lib/python3.8/site-packages/detectron2/data/datasets/pascal_voc.py文件，
运行以下命令：

vim /opt/conda/lib/python3.8/site-packages/detectron2/data/datasets/pascal_voc.py

修改代码：

后来出现新的问题：

File "/opt/conda/lib/python3.8/site-packages/iopath/common/file_io.py", line 604, in _open
    return open(  # type: ignore
FileNotFoundError: [Errno 2] No such file or directory: "datasets/VOC2007/Annotations/['7_hunse_left' '(28)'].xml"

这里相当于已经拼接好了所有的文件路径，但是由于版本或者刚刚修改代码的原因，导致我拼接的路径，带有[ ] ’ ’ 字符。我的原文件名字为7_hunse_left(28).xml，结果它是[‘7_hunse_left’ ‘(28)’].xml
所以接下来就是修改代码，将路径中的这几个无用字符删掉。
依旧是修改/opt/conda/lib/python3.8/site-packages/detectron2/data/datasets/pascal_voc.py文件，
运行以下命令：

vim /opt/conda/lib/python3.8/site-packages/detectron2/data/datasets/pascal_voc.py

修改代码：

rep=[('[', ''), (']', ''),('\'', '')]
        for c, r in rep:
            if c in anno_file:
                anno_file= anno_file.replace(c, r)
        for c, r in rep:
            if c in jpeg_file:
                jpeg_file = jpeg_file.replace(c, r)

这时候运行python train_net.py --config-file configs/diffdet.coco.res50.yaml就成功了。
显示以下信息：

4.训练成功

Command Line Args: Namespace(config_file='configs/diffdet.coco.res50.yaml', dist_url='tcp://127.0.0.1:49152', eval_only=False, machine_rank=0, num_gpus=1, num_machines=1, opts=[], resume=False)
[04/07 03:14:31 detectron2]: Rank of current process: 0. World size: 1
[04/07 03:14:33 detectron2]: Environment info:
----------------------  ---------------------------------------------------------
sys.platform            linux
Python                  3.8.8 (default, Feb 24 2021, 21:46:12) [GCC 7.3.0]
numpy                   1.24.2
detectron2              0.6 @/opt/conda/lib/python3.8/site-packages/detectron2
Compiler                GCC 7.3
CUDA compiler           CUDA 11.1
detectron2 arch flags   3.7, 5.0, 5.2, 6.0, 6.1, 7.0, 7.5, 8.0, 8.6
DETECTRON2_ENV_MODULE   <not set>
PyTorch                 1.8.1 @/opt/conda/lib/python3.8/site-packages/torch
PyTorch debug build     False
GPU available           Yes
GPU 0                   Tesla T4 (arch=7.5)
Driver version          470.161.03
CUDA_HOME               /usr/local/cuda
Pillow                  8.1.2
torchvision             0.9.1 @/opt/conda/lib/python3.8/site-packages/torchvision
torchvision arch flags  3.5, 5.0, 6.0, 7.0, 7.5, 8.0, 8.6
fvcore                  0.1.5.post20221221
iopath                  0.1.9
cv2                     4.7.0
----------------------  ---------------------------------------------------------
PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v1.7.0 (Git Hash 7aed236906b1f7a05c0917e5257a1af05e9ff683)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.1
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
  - CuDNN 8.0.5
  - Magma 2.5.2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.8.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

[04/07 03:14:33 detectron2]: Command line arguments: Namespace(config_file='configs/diffdet.coco.res50.yaml', dist_url='tcp://127.0.0.1:49152', eval_only=False, machine_rank=0, num_gpus=1, num_machines=1, opts=[], resume=False)
[04/07 03:14:33 detectron2]: Contents of args.config_file=configs/diffdet.coco.res50.yaml:
_BASE_: "Base-DiffusionDet.yaml"
...(省略)
    NUM_CLASSES: 1
    NUM_CLS: 1
    NUM_DYNAMIC: 2
VERSION: 2
VIS_PERIOD: 0

[04/07 03:14:33 detectron2]: Full config saved to ./output/config.yaml
anno_file : datasets/VOC2007/Annotations/7_hunse_left (28).xml
jpeg_file : datasets/VOC2007/JPEGImages/7_hunse_left (28).jpg
...(省略)
anno_file : datasets/VOC2007/Annotations/9_yiwu_right (9).xml
jpeg_file : datasets/VOC2007/JPEGImages/9_yiwu_right (9).jpg
[04/07 03:14:37 d2.data.build]: Using training sampler TrainingSampler
[04/07 03:14:37 d2.data.common]: Serializing 255 elements to byte tensors and concatenating them all ...
[04/07 03:14:37 d2.data.common]: Serialized dataset takes 0.12 MiB
WARNING [04/07 03:14:37 d2.solver.build]: SOLVER.STEPS contains values larger than SOLVER.MAX_ITER. These values will be ignored.
[04/07 03:14:37 fvcore.common.checkpoint]: [Checkpointer] Loading from /home/workspace_disk/DiffusionDet-main/model/diffdet_coco_res50.pth ...
WARNING [04/07 03:14:37 fvcore.common.checkpoint]: Skip loading parameter 'head.head_series.1.class_logits.bias' to the model due to incompatible shapes: (80,) in the checkpoint but (1,) in the model! You might want to double check if this is expected.
WARNING [04/07 03:14:37 fvcore.common.checkpoint]: Skip loading parameter 'head.head_series.5.class_logits.bias' to the model due to incompatible shapes: (80,) in the checkpoint but (1,) in the model! You might want to double check if this is expected.
WARNING [04/07 03:14:37 fvcore.common.checkpoint]: Some model parameters or buffers are not found in the checkpoint:
head.head_series.0.class_logits.{bias, weight}
head.head_series.1.class_logits.{bias, weight}
head.head_series.2.class_logits.{bias, weight}
head.head_series.3.class_logits.{bias, weight}
head.head_series.4.class_logits.{bias, weight}
head.head_series.5.class_logits.{bias, weight}
[04/07 03:14:37 d2.engine.train_loop]: Starting training from iteration 0
[04/07 03:15:09 d2.utils.events]:  eta: 19:46:08  iter: 19  total_loss: 29.27  loss_ce: 2.284  loss_bbox: 0.9446  loss_giou: 1.874  loss_ce_0: 2.142  loss_bbox_0: 1  loss_giou_0: 1.886  loss_ce_1: 2.153  loss_bbox_1: 0.9317  loss_giou_1: 1.889  loss_ce_2: 1.589  loss_bbox_2: 0.8879  loss_giou_2: 1.887  loss_ce_3: 1.95  loss_bbox_3: 0.8332  loss_giou_3: 1.868  loss_ce_4: 2.502  loss_bbox_4: 0.8332  loss_giou_4: 1.867  time: 1.5567  data_time: 0.0228  lr: 7.2025e-07  max_mem: 9072M

5.其他可能出现的问题

问题①：CUDA out of memory

File "/opt/conda/lib/python3.8/site-packages/torch/nn/functional.py", line 2205, in layer_norm
    return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 14.76 GiB total capacity; 13.67 GiB already allocated; 21.75 MiB free; 13.72 GiB reserved in total by PyTorch)

修改DiffusionDet-main/configs/Base-DiffusionDet.yaml

问题②：tuple.index(x): x not in tuple

 File "/opt/conda/lib/python3.8/site-packages/detectron2/data/datasets/pascal_voc.py", line 80, in load_voc_instances
    {"category_id": class_names.index(cls), "bbox": bbox, "bbox_mode": BoxMode.XYXY_ABS}
ValueError: tuple.index(x): x not in tuple

原因是数据类别不一致，修改：detectron2/data/datasets/pascal_voc.py
将CLASS_NAMES改为正确的类别即可。

6.评估

①改动DiffusionDet-main/train_net.py

参考了几个帖子，还有的把整个build_evaluator都改了，发现改动都太大了，我就只改这个调用函数了。

②如果你只有一类，务必改动detectron2/data/datasets/pascal_voc.py

加逗号，不然他会把我的bad拆成三类，

下面这是个元组，只有一个元素的话，必须带个逗号，大意了。
detectron2/data/datasets/pascal_voc.py下的：

这回好了

问题①：‘NoneType’ object has no attribute ‘text’

问题出在File “/opt/conda/lib/python3.8/site-packages/detectron2/evaluation/pascal_voc_evaluation.py”, 这里
我查看了下这个代码：

又看了下我的xml

发现我没有pose这一项
所以把pascal_voc_evaluation.py里的obj_struct[“pose”] = obj.find(“pose”)这一行删掉

问题②：ValueError: invalid literal for int() with base 10: ‘1260.29’

把pascal_voc_evaluation.py的四个int改为float

问题③：module ‘numpy’ has no attribute ‘bool’.

把numpy从1.22.x升级到1.23.1
pip就行

文章出处登录后可见！

已经登录？立即刷新