文章目录
- 1.数据格式
- ①目录格式
- ②Annotations:xml文件
- ③ImageSets/Main/下:train.txt和val.txt
- ④JPEGImages下:图片
- 2.下载预训练模型
- 3.修改代码
- ①修改配置文件diffdet.coco.res50.yaml
- ②修改配置文件Base-DiffusionDet.yaml
- ③修改detectron2里的pascal_voc.py
- 4.训练成功
- 5.其他可能出现的问题
- 问题①:CUDA out of memory
- 问题②:tuple.index(x): x not in tuple
- 6.评估
- ①改动DiffusionDet-main/train_net.py
- ②如果你只有一类,务必改动detectron2/data/datasets/pascal_voc.py
- 问题①:’NoneType’ object has no attribute ‘text’
- 问题②:ValueError: invalid literal for int() with base 10: ‘1260.29’
- 问题③:module ‘numpy’ has no attribute ‘bool’.
此贴建立在DiffusionDet和detectron2环境已经配置好(能跑通DiffusionDet的demo.py就行)
我没有跟着官方手顺建立软链接什么的,比较麻烦,我直接按照自己的习惯建的目录
DiffusionDet代码链接:https://github.com/ShoufaChen/DiffusionDet
detectron2代码链接:https://github.com/facebookresearch/detectron2
1.数据格式
①目录格式
②Annotations:xml文件
③ImageSets/Main/下:train.txt和val.txt
train.txt:
val.txt:
④JPEGImages下:图片
2.下载预训练模型
https://github.com/ShoufaChen/DiffusionDet
我下的COCO-Res50,放在了DiffusionDet-main/model/(自己建的)下了
3.修改代码
这部分巨麻烦,我很服detectron2,(解决Fix for numpy deprecation of np.strhttps://github.com/facebookresearch/detectron2/pull/4806这个问题)。
如果有什么其余很好的方法,我后续再改吧。
①修改配置文件diffdet.coco.res50.yaml
DiffusionDet-main/configs/下的diffdet.coco.res50.yaml
- 红框为必改
- WEIGHTS参数我写的全路径(不犯错)
- NUM_CLASSES自己按照自己的数据集类别去改
- line11和line12的voc_2007_train和voc_2007_val不要动,他代表数据格式,不是你的文件名字
②修改配置文件Base-DiffusionDet.yaml
后面爆显存时修改这个batch,调小。
③修改detectron2里的pascal_voc.py
这一步是由于按照上面修改完配置文件后,其实按理来说应该可以跑起来了,但是运行python train_net.py --config-file configs/diffdet.coco.res50.yaml
时出现了问题。
File "/opt/conda/lib/python3.8/site-packages/detectron2/data/datasets/pascal_voc.py", line 35, in load_voc_instances
fileids = np.loadtxt(f, dtype=np.str)
File "/opt/conda/lib/python3.8/site-packages/numpy/__init__.py", line 305, in __getattr__
raise AttributeError(__former_attrs__[attr])
AttributeError: module 'numpy' has no attribute 'str'.
`np.str` was a deprecated alias for the builtin `str`. To avoid this error in existing code, use `str` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.str_` here.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
先说全部的解决方案,
第一种:修改numpy版本为1.20.1(推荐)
第二种:暴力改代码:需要修改/opt/conda/lib/python3.8/site-packages/detectron2/data/datasets/pascal_voc.py文件,再说分步的:
打开/opt/conda/lib/python3.8/site-packages/detectron2/data/datasets/pascal_voc.py文件:
vim /opt/conda/lib/python3.8/site-packages/detectron2/data/datasets/pascal_voc.py
需要修改四个地方。
修改完运行python train_net.py --config-file configs/diffdet.coco.res50.yaml
就可以了。
详细解决步骤:
我看Detectron2的GitHub的issue里也有这个问题:
https://github.com/facebookresearch/detectron2/pull/4806
刚开始我以为我当时装detectron2时,应该新拉代码再编译,后来发现不是这样的,他新代码也这样,具体为下:
但是我没看到具体的解决办法,我看最新版的代码的detectron2/detectron2/data/datasets/pascal_voc.py里,也是存在这个问题的。
修改办法:
1.尝试过修改numpy版本,后来觉得还是算了,直接改detectron2代码把。
2.既然是在/opt/conda/lib/python3.8/site-packages/detectron2/data/datasets/pascal_voc.py出现的问题,那就去修改这个文件。
运行以下命令:
vim /opt/conda/lib/python3.8/site-packages/detectron2/data/datasets/pascal_voc.py
原版不复制在这里了,我直接标注我改了哪里:
(我的修改方法比较暴力,遇到问题直接修改源码那种)
后来出现新的问题:
DatasetCatalog.register(name, lambda: load_voc_instances(dirname, split, class_names))
File "/opt/conda/lib/python3.8/site-packages/detectron2/data/datasets/pascal_voc.py", line 41, in load_voc_instances
anno_file = os.path.join(annotation_dirname, fileid + ".xml")
numpy.core._exceptions.UFuncTypeError: ufunc 'add' did not contain a loop with signature matching types (dtype('<U13'), dtype('<U4')) -> None
解决办法:
还是修改/opt/conda/lib/python3.8/site-packages/detectron2/data/datasets/pascal_voc.py文件,
运行以下命令:
vim /opt/conda/lib/python3.8/site-packages/detectron2/data/datasets/pascal_voc.py
修改代码:
后来出现新的问题:
File "/opt/conda/lib/python3.8/site-packages/iopath/common/file_io.py", line 604, in _open
return open( # type: ignore
FileNotFoundError: [Errno 2] No such file or directory: "datasets/VOC2007/Annotations/['7_hunse_left' '(28)'].xml"
这里相当于已经拼接好了所有的文件路径,但是由于版本或者刚刚修改代码的原因,导致我拼接的路径,带有[ ] ’ ’ 字符。我的原文件名字为7_hunse_left(28).xml,结果它是[‘7_hunse_left’ ‘(28)’].xml
所以接下来就是修改代码,将路径中的这几个无用字符删掉。
依旧是修改/opt/conda/lib/python3.8/site-packages/detectron2/data/datasets/pascal_voc.py文件,
运行以下命令:
vim /opt/conda/lib/python3.8/site-packages/detectron2/data/datasets/pascal_voc.py
修改代码:
rep=[('[', ''), (']', ''),('\'', '')]
for c, r in rep:
if c in anno_file:
anno_file= anno_file.replace(c, r)
for c, r in rep:
if c in jpeg_file:
jpeg_file = jpeg_file.replace(c, r)
这时候运行python train_net.py --config-file configs/diffdet.coco.res50.yaml
就成功了。
显示以下信息:
4.训练成功
Command Line Args: Namespace(config_file='configs/diffdet.coco.res50.yaml', dist_url='tcp://127.0.0.1:49152', eval_only=False, machine_rank=0, num_gpus=1, num_machines=1, opts=[], resume=False)
[04/07 03:14:31 detectron2]: Rank of current process: 0. World size: 1
[04/07 03:14:33 detectron2]: Environment info:
---------------------- ---------------------------------------------------------
sys.platform linux
Python 3.8.8 (default, Feb 24 2021, 21:46:12) [GCC 7.3.0]
numpy 1.24.2
detectron2 0.6 @/opt/conda/lib/python3.8/site-packages/detectron2
Compiler GCC 7.3
CUDA compiler CUDA 11.1
detectron2 arch flags 3.7, 5.0, 5.2, 6.0, 6.1, 7.0, 7.5, 8.0, 8.6
DETECTRON2_ENV_MODULE <not set>
PyTorch 1.8.1 @/opt/conda/lib/python3.8/site-packages/torch
PyTorch debug build False
GPU available Yes
GPU 0 Tesla T4 (arch=7.5)
Driver version 470.161.03
CUDA_HOME /usr/local/cuda
Pillow 8.1.2
torchvision 0.9.1 @/opt/conda/lib/python3.8/site-packages/torchvision
torchvision arch flags 3.5, 5.0, 6.0, 7.0, 7.5, 8.0, 8.6
fvcore 0.1.5.post20221221
iopath 0.1.9
cv2 4.7.0
---------------------- ---------------------------------------------------------
PyTorch built with:
- GCC 7.3
- C++ Version: 201402
- Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v1.7.0 (Git Hash 7aed236906b1f7a05c0917e5257a1af05e9ff683)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- NNPACK is enabled
- CPU capability usage: AVX2
- CUDA Runtime 11.1
- NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
- CuDNN 8.0.5
- Magma 2.5.2
- Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.8.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,
[04/07 03:14:33 detectron2]: Command line arguments: Namespace(config_file='configs/diffdet.coco.res50.yaml', dist_url='tcp://127.0.0.1:49152', eval_only=False, machine_rank=0, num_gpus=1, num_machines=1, opts=[], resume=False)
[04/07 03:14:33 detectron2]: Contents of args.config_file=configs/diffdet.coco.res50.yaml:
_BASE_: "Base-DiffusionDet.yaml"
...(省略)
NUM_CLASSES: 1
NUM_CLS: 1
NUM_DYNAMIC: 2
VERSION: 2
VIS_PERIOD: 0
[04/07 03:14:33 detectron2]: Full config saved to ./output/config.yaml
anno_file : datasets/VOC2007/Annotations/7_hunse_left (28).xml
jpeg_file : datasets/VOC2007/JPEGImages/7_hunse_left (28).jpg
...(省略)
anno_file : datasets/VOC2007/Annotations/9_yiwu_right (9).xml
jpeg_file : datasets/VOC2007/JPEGImages/9_yiwu_right (9).jpg
[04/07 03:14:37 d2.data.build]: Using training sampler TrainingSampler
[04/07 03:14:37 d2.data.common]: Serializing 255 elements to byte tensors and concatenating them all ...
[04/07 03:14:37 d2.data.common]: Serialized dataset takes 0.12 MiB
WARNING [04/07 03:14:37 d2.solver.build]: SOLVER.STEPS contains values larger than SOLVER.MAX_ITER. These values will be ignored.
[04/07 03:14:37 fvcore.common.checkpoint]: [Checkpointer] Loading from /home/workspace_disk/DiffusionDet-main/model/diffdet_coco_res50.pth ...
WARNING [04/07 03:14:37 fvcore.common.checkpoint]: Skip loading parameter 'head.head_series.1.class_logits.bias' to the model due to incompatible shapes: (80,) in the checkpoint but (1,) in the model! You might want to double check if this is expected.
WARNING [04/07 03:14:37 fvcore.common.checkpoint]: Skip loading parameter 'head.head_series.5.class_logits.bias' to the model due to incompatible shapes: (80,) in the checkpoint but (1,) in the model! You might want to double check if this is expected.
WARNING [04/07 03:14:37 fvcore.common.checkpoint]: Some model parameters or buffers are not found in the checkpoint:
head.head_series.0.class_logits.{bias, weight}
head.head_series.1.class_logits.{bias, weight}
head.head_series.2.class_logits.{bias, weight}
head.head_series.3.class_logits.{bias, weight}
head.head_series.4.class_logits.{bias, weight}
head.head_series.5.class_logits.{bias, weight}
[04/07 03:14:37 d2.engine.train_loop]: Starting training from iteration 0
[04/07 03:15:09 d2.utils.events]: eta: 19:46:08 iter: 19 total_loss: 29.27 loss_ce: 2.284 loss_bbox: 0.9446 loss_giou: 1.874 loss_ce_0: 2.142 loss_bbox_0: 1 loss_giou_0: 1.886 loss_ce_1: 2.153 loss_bbox_1: 0.9317 loss_giou_1: 1.889 loss_ce_2: 1.589 loss_bbox_2: 0.8879 loss_giou_2: 1.887 loss_ce_3: 1.95 loss_bbox_3: 0.8332 loss_giou_3: 1.868 loss_ce_4: 2.502 loss_bbox_4: 0.8332 loss_giou_4: 1.867 time: 1.5567 data_time: 0.0228 lr: 7.2025e-07 max_mem: 9072M
5.其他可能出现的问题
问题①:CUDA out of memory
File "/opt/conda/lib/python3.8/site-packages/torch/nn/functional.py", line 2205, in layer_norm
return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 14.76 GiB total capacity; 13.67 GiB already allocated; 21.75 MiB free; 13.72 GiB reserved in total by PyTorch)
修改DiffusionDet-main/configs/Base-DiffusionDet.yaml
问题②:tuple.index(x): x not in tuple
File "/opt/conda/lib/python3.8/site-packages/detectron2/data/datasets/pascal_voc.py", line 80, in load_voc_instances
{"category_id": class_names.index(cls), "bbox": bbox, "bbox_mode": BoxMode.XYXY_ABS}
ValueError: tuple.index(x): x not in tuple
原因是数据类别不一致,修改:detectron2/data/datasets/pascal_voc.py
将CLASS_NAMES改为正确的类别即可。
6.评估
①改动DiffusionDet-main/train_net.py
参考了几个帖子,还有的把整个build_evaluator都改了,发现改动都太大了,我就只改这个调用函数了。
②如果你只有一类,务必改动detectron2/data/datasets/pascal_voc.py
加逗号,不然他会把我的bad拆成三类,
下面这是个元组,只有一个元素的话,必须带个逗号,大意了。
detectron2/data/datasets/pascal_voc.py下的:
这回好了
问题①:‘NoneType’ object has no attribute ‘text’
问题出在File “/opt/conda/lib/python3.8/site-packages/detectron2/evaluation/pascal_voc_evaluation.py”, 这里
我查看了下这个代码:
又看了下我的xml
发现我没有pose这一项
所以把pascal_voc_evaluation.py里的obj_struct[“pose”] = obj.find(“pose”)这一行删掉
问题②:ValueError: invalid literal for int() with base 10: ‘1260.29’
把pascal_voc_evaluation.py的四个int改为float
问题③:module ‘numpy’ has no attribute ‘bool’.
把numpy从1.22.x升级到1.23.1
pip就行
文章出处登录后可见!