不能在分类变压器模型中向后传递两个损失

xiaoxingxing 2年前 pytorch 200

原文标题 ：Can’t backward pass two losses in Classification Transformer Model

对于我的模型，我使用的是 roberta 变压器模型和 Huggingface 变压器库中的 Trainer。

我计算了两个损失：lloss是交叉熵损失，dloss计算层次结构层之间的损失。

总损失是 lloss 和 dloss 之和。（基于此）

但是，当调用total_loss.backwards()时，我收到错误消息：

RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed

知道为什么会这样吗？我可以强制它只向后调用一次吗？这是损失计算部分：

dloss = calculate_dloss(prediction, labels, 3)
lloss = calculate_lloss(predeiction, labels, 3)
total_loss = lloss + dloss 
total_loss.backward()

def calculate_lloss(predictions, true_labels, total_level):
    '''Calculates the layer loss.
    '''

    loss_fct = nn.CrossEntropyLoss()

    lloss = 0
    for l in range(total_level):

        lloss += loss_fct(predictions[l], true_labels[l])

    return self.alpha * lloss

def calculate_dloss(predictions, true_labels, total_level):
    '''Calculate the dependence loss.
    '''

    dloss = 0
    for l in range(1, total_level):

        current_lvl_pred = torch.argmax(nn.Softmax(dim=1)(predictions[l]), dim=1)
        prev_lvl_pred = torch.argmax(nn.Softmax(dim=1)(predictions[l-1]), dim=1)

        D_l = self.check_hierarchy(current_lvl_pred, prev_lvl_pred, l)  #just a boolean tensor

        l_prev = torch.where(prev_lvl_pred == true_labels[l-1], torch.FloatTensor([0]).to(self.device), torch.FloatTensor([1]).to(self.device))
        l_curr = torch.where(current_lvl_pred == true_labels[l], torch.FloatTensor([0]).to(self.device), torch.FloatTensor([1]).to(self.device))

        dloss += torch.sum(torch.pow(self.p_loss, D_l*l_prev)*torch.pow(self.p_loss, D_l*l_curr) - 1)

    return self.beta * dloss

原文链接：https://stackoverflow.com//questions/71465239/cant-backward-pass-two-losses-in-classification-transformer-model

我来回复

FlyingTeller 评论

损失是两个单独损失的总和并没有错，这是改编自文档的一个小原理证明：

import torch
import numpy
from sklearn.datasets import make_blobs

class Feedforward(torch.nn.Module):
    def __init__(self, input_size, hidden_size):
        super(Feedforward, self).__init__()
        self.input_size = input_size
        self.hidden_size  = hidden_size
        self.fc1 = torch.nn.Linear(self.input_size, self.hidden_size)
        self.relu = torch.nn.ReLU()
        self.fc2 = torch.nn.Linear(self.hidden_size, 1)
        self.sigmoid = torch.nn.Sigmoid()
    def forward(self, x):
        hidden = self.fc1(x)
        relu = self.relu(hidden)
        output = self.fc2(relu)
        output = self.sigmoid(output)
        return output

def blob_label(y, label, loc): # assign labels
    target = numpy.copy(y)
    for l in loc:
        target[y == l] = label
    return target

x_train, y_train = make_blobs(n_samples=40, n_features=2, cluster_std=1.5, shuffle=True)
x_train = torch.FloatTensor(x_train)
y_train = torch.FloatTensor(blob_label(y_train, 0, [0]))
y_train = torch.FloatTensor(blob_label(y_train, 1, [1,2,3]))

x_test, y_test = make_blobs(n_samples=10, n_features=2, cluster_std=1.5, shuffle=True)
x_test = torch.FloatTensor(x_test)
y_test = torch.FloatTensor(blob_label(y_test, 0, [0]))
y_test = torch.FloatTensor(blob_label(y_test, 1, [1,2,3]))


model = Feedforward(2, 10)
criterion = torch.nn.BCELoss()
optimizer = torch.optim.SGD(model.parameters(), lr = 0.01)


model.eval()
y_pred = model(x_test)
before_train = criterion(y_pred.squeeze(), y_test)
print('Test loss before training' , before_train.item())

model.train()
epoch = 20
for epoch in range(epoch):
    optimizer.zero_grad()    # Forward pass
    y_pred = model(x_train)    # Compute Loss
    lossCE= criterion(y_pred.squeeze(), y_train)
    lossSQD = (y_pred.squeeze()-y_train).pow(2).mean()
    loss=lossCE+lossSQD
    print('Epoch {}: train loss: {}'.format(epoch, loss.item()))    # Backward pass
    loss.backward()
    optimizer.step()

必须有第二次您直接或间接调用backward在某个变量上，然后遍历您的图表。在这里要求完整代码有点过分，只有您可以检查或至少将其简化为最小示例（这样做时，您可能已经发现问题）。除此之外，我会开始检查：

它是否已经发生在第一次训练迭代中？如果不是：您是否在没有分离的情况下重复使用第二次迭代的任何计算结果？
当你单独对损失进行反向处理时， lloss.backward() 然后是 dloss.backward() （这与在累积梯度时首先将它们相加具有相同的效果）：会发生什么？这将让您跟踪发生错误的两个损失中的哪一个。

2年前 0条评论

D. ACAR 评论

在backward() 你的comp之后。 graph 被释放，因此第二次向后您需要通过再次提供输入来创建新图。如果您想在向后（出于某种原因）之后重复相同的图，则需要将向后的 retain_graph 标志指定为 True。请参阅retain_graphhere。

P.S。As the summation of Tensors is automatically differentiable, summing the losses would not cause any issue in the backward.

2年前 0条评论

不能在分类变压器模型中向后传递两个损失

回复

相关问题