CNN 中的奇怪损失

青葱年少 pytorch 234

原文标题Curious loss in a CNN

在我用于图像分类的 CNN 中,我得到了一个奇怪的损失,我不知道出了什么问题。我很幸运,如果你能帮我找到失败的地方。这是我打印输出的截图,最后是我的代码:

Train Epoch: 1 [0/2048 (0%)]    Loss: 0.654869
Train Epoch: 1 [64/2048 (3%)]   Loss: 0.271722
Train Epoch: 1 [128/2048 (6%)]  Loss: 0.001958
Train Epoch: 1 [192/2048 (9%)]  Loss: 0.003399
Train Epoch: 1 [256/2048 (12%)] Loss: 0.000000
Train Epoch: 1 [320/2048 (16%)] Loss: 0.006664
Train Epoch: 1 [384/2048 (19%)] Loss: 0.000000
Train Epoch: 1 [448/2048 (22%)] Loss: 0.000000
Train Epoch: 1 [512/2048 (25%)] Loss: 0.000000
Train Epoch: 1 [576/2048 (28%)] Loss: 0.000000
Train Epoch: 2 [0/2048 (0%)]    Loss: 173505.656250
Train Epoch: 2 [64/2048 (3%)]   Loss: 0.000000
Train Epoch: 2 [128/2048 (6%)]  Loss: 0.000000
Train Epoch: 2 [192/2048 (9%)]  Loss: 33394.285156
Train Epoch: 2 [256/2048 (12%)] Loss: 0.000000
Train Epoch: 2 [320/2048 (16%)] Loss: 0.000000
Train Epoch: 2 [960/2048 (47%)] Loss: 0.000000
Train Epoch: 2 [1024/2048 (50%)]        Loss: 636908.437500
Train Epoch: 2 [1088/2048 (53%)]        Loss: 32862667387437056.000000
Train Epoch: 2 [1152/2048 (56%)]        Loss: 15723443952412777718762887446528.000000
Train Epoch: 2 [1216/2048 (59%)]        Loss: nan
Train Epoch: 2 [1280/2048 (62%)]        Loss: nan
Train Epoch: 2 [1344/2048 (66%)]        Loss: nan
Train Epoch: 2 [1408/2048 (69%)]        Loss: nan

在这里,您会看到训练的代码。

def trainM(epoch):
    model.train()
    for batch_id, (data, target) in enumerate(net.train_data):
        target = torch.LongTensor(target[64*batch_id:64*(batch_id+1)])
        data = Variable(data)
        target = Variable(target)
        optimizer.zero_grad()

        out = model(data)
        criterion = F.nll_loss

        loss = criterion(out,target)
        loss.backward()
        optimizer.step()
       
        print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(epoch,batch_id*len(data), len(net.train_data)*64, 100*batch_id/len(net.train_data), loss.item()))
        

for item in range(1,10):
    trainM(item)

那是神经网络的代码,最后是用于数据准备的 dataPrep 方法。

train_data = []
target_list = []
class Netz(nn.Module):
    def __init__(self):
        super(Netz, self).__init__()
        self.conv1 = nn.Conv2d(1, 10,kernel_size=5)
        self.conv2 = nn.Conv2d(10,20, kernel_size = 5)
        self.conv_dropout = nn.Dropout2d()
        self.fc1 = nn.Linear(1050,60)
        self.fc2 = nn.Linear(60,2)
        self.fce = nn.Linear(20,1)
    
    def forward(self,x):
        x = self.conv1(x)
        x = F.max_pool2d(x, 2)
        x = F.relu(x)
        x = self.conv2(x)
        x = self.conv_dropout(x)
        x = F.max_pool2d(x,2)
        x = F.relu(x)
        x = x.reshape(x.shape[0], x.shape[1], -1)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        x = self.fce(x.permute(0,2,1)).squeeze(-1)
        return F.log_softmax(x, -1)


def dataPrep(list_of_data, data_path, category, quantity):
    global train_data
    global target_list
    train_data_list = []
    
    transform = transforms.Compose([
    transforms.ToTensor(),
        ])
    
    len_data = len(train_data)
    for item in list_of_data:
        f = random.choice(list_of_data)
        list_of_data.remove(f)
        try:
            img = Image.open(data_path +f)
        except:
            continue
        img_crop = img.crop((310,60,425,240))
        img_tensor = transform(img_crop)
        train_data_list.append(img_tensor)

        if category == True:
            target = 1
        else:
            target = 0
        target_list.append(target)
        
        if len(train_data_list) >=64:
            train_data.append((torch.stack(train_data_list), target_list))
            train_data_list = []
            
        if (len_data*64 + quantity) <= len(train_data)*64:
            break
    return list_of_data

原文链接:https://stackoverflow.com//questions/71955542/curious-loss-in-a-cnn

回复

我来回复
  • DerekG的头像
    DerekG 评论

    我可能还建议使用卷积层权重的随机参数初始化网络。默认情况下,这些权重为 0,这可能意味着您最终会预测所有一个类。这可以解释非常低 (0) 或非常高的损失(基于特定批次的组成)。

    2年前 0条评论
  • Deusy94的头像
    Deusy94 评论

    您可以尝试多种方法来解决该问题。

    1. 尝试降低学习率,尝试在 1e-03 到 1e-04 之间
    2. 剪辑渐变,修改您的代码,例如:
    def trainM(epoch):
    
        ...
    
        # Backward
        loss.backward()
        torch.nn.utils.clip_grad_norm_(self.net.parameters(), max_norm=1)
        self.optim.step()
    
        ...
    
    
    1. 更改数据归一化,尝试 min-max 和 Z-Score 归一化

    除此之外,我可以看到您的模型很快达到收敛(损失很快就为零),您的任务可能太容易了。然后,您可以减少迭代次数。

    2年前 0条评论