如何修复 YOLOX 中的训练错误启动:90?
machine-learning 362
原文标题 :How to fix training error launch:90 in YOLOX?
!python tools/train.py -f exps/example/yolox_voc/yolox_voc_s.py -d 1 -b 16 --fp16 -c /content/yolox_s.pth
2022-03-29 19:24:57 | INFO | apex.amp.frontend:356 - enabled : True
2022-03-29 19:24:57 | INFO | apex.amp.frontend:356 - opt_level : O1
2022-03-29 19:24:57 | INFO | apex.amp.frontend:356 - cast_model_type : None
2022-03-29 19:24:57 | INFO | apex.amp.frontend:356 - patch_torch_functions : True
2022-03-29 19:24:57 | INFO | apex.amp.frontend:356 - keep_batchnorm_fp32 : None
2022-03-29 19:24:57 | INFO | apex.amp.frontend:356 - master_weights : None
2022-03-29 19:24:57 | INFO | apex.amp.frontend:356 - loss_scale : dynamic
2022-03-29 19:24:57 | INFO | yolox.core.trainer:297 - loading checkpoint for fine tuning
2022-03-29 19:24:57 | WARNING | yolox.utils.checkpoint:27 - Shape of head.cls_preds.0.weight in checkpoint is torch.Size([80, 128, 1, 1]), while shape of head.cls_preds.0.weight in model is torch.Size([3, 128, 1, 1]).
2022-03-29 19:24:57 | WARNING | yolox.utils.checkpoint:27 - Shape of head.cls_preds.0.bias in checkpoint is torch.Size([80]), while shape of head.cls_preds.0.bias in model is torch.Size([3]).
2022-03-29 19:24:57 | WARNING | yolox.utils.checkpoint:27 - Shape of head.cls_preds.1.weight in checkpoint is torch.Size([80, 128, 1, 1]), while shape of head.cls_preds.1.weight in model is torch.Size([3, 128, 1, 1]).
2022-03-29 19:24:57 | WARNING | yolox.utils.checkpoint:27 - Shape of head.cls_preds.1.bias in checkpoint is torch.Size([80]), while shape of head.cls_preds.1.bias in model is torch.Size([3]).
2022-03-29 19:24:57 | WARNING | yolox.utils.checkpoint:27 - Shape of head.cls_preds.2.weight in checkpoint is torch.Size([80, 128, 1, 1]), while shape of head.cls_preds.2.weight in model is torch.Size([3, 128, 1, 1]).
2022-03-29 19:24:57 | WARNING | yolox.utils.checkpoint:27 - Shape of head.cls_preds.2.bias in checkpoint is torch.Size([80]), while shape of head.cls_preds.2.bias in model is torch.Size([3]).
2022-03-29 19:24:57 | ERROR | yolox.core.launch:90 - An error has been caught in function 'launch', process 'MainProcess' (1549), thread 'MainThread' (140243385931648):
Traceback (most recent call last):
File "tools/train.py", line 125, in <module>
args=(exp, args),
│ └ Namespace(batch_size=16, ckpt='/content/yolox_s.pth', devices=1, dist_backend='nccl', dist_url=None, exp_file='exps/example/y...
└ ╒══════════════════╤═════════════════════════════════════════════════════════════════════════════════════════════════════════...
> File "/content/apex/YOLOX/yolox/core/launch.py", line 90, in launch
main_func(*args)
│ └ (╒══════════════════╤════════════════════════════════════════════════════════════════════════════════════════════════════════...
└ <function main at 0x7f8cf10d3e60>
File "tools/train.py", line 104, in main
trainer.train()
│ └ <function Trainer.train at 0x7f8bf2234d40>
└ <yolox.core.trainer.Trainer object at 0x7f8bec4a7a90>
File "/content/apex/YOLOX/yolox/core/trainer.py", line 69, in train
self.before_train()
│ └ <function Trainer.before_train at 0x7f8bec969710>
└ <yolox.core.trainer.Trainer object at 0x7f8bec4a7a90>
File "/content/apex/YOLOX/yolox/core/trainer.py", line 150, in before_train
no_aug=self.no_aug,
│ └ False
└ <yolox.core.trainer.Trainer object at 0x7f8bec4a7a90>
File "exps/example/yolox_voc/yolox_voc_s.py", line 36, in get_data_loader
max_labels=50,
File "/content/apex/YOLOX/yolox/data/datasets/voc.py", line 115, in __init__
os.path.join(rootpath, "ImageSets", "Main", name + ".txt")
│ │ │ │ └ 'trainval'
│ │ │ └ '/content/apex/YOLOX/datasets/VOCdevkit/VOC2007'
│ │ └ <function join at 0x7f8cf31177a0>
│ └ <module 'posixpath' from '/usr/lib/python3.7/posixpath.py'>
└ <module 'os' from '/usr/lib/python3.7/os.py'>
FileNotFoundError: [Errno 2] No such file or directory: '/content/apex/YOLOX/datasets/VOCdevkit/VOC2007/ImageSets/Main/trainval.txt'
该文件存在于/content/YOLOX/datasets/VOCdevkit/VOC2007/ImageSets/Main/trainval.txt
但不存在于/content/apex/YOLOX/datasets/VOCdevkit/VOC2007/ImageSets/Main/trainval.txt
我该如何解决?
回复
我来回复-
Sadra Naddaf 评论
这更像是一个解决问题而不是机器学习问题;无论如何,如果你使用这个并且在你的
YOLOX
文件夹中有带有.pth
检查点的content
文件夹,你应该运行如下命令(假设你的终端路径在你的 YOlox 文件夹内(使用运行pwd
命令检查)):假设您想在自定义数据集上进行培训,您应该遵循他们的指南here;例如,如果你的数据在 coco 你应该把它放在
./datasets
文件夹现在,如果您在文件夹
./content/
中有下载的权重,那么以下命令开始基于 yolox_s.pth 对内部图像进行训练./datasets
假设它们是 coco 格式。python tools/train.py -f exps/example/yolox_voc/yolox_voc_s.py -d 1 -b 16 --fp16 -c content/yolox_s.pth
注意:
/
开头的路径是指文件系统的开头,而./
(或不使用)指的是当前文件夹。2年前