不能在分类变压器模型中向后传递两个损失
pytorch 245
原文标题 :Can’t backward pass two losses in Classification Transformer Model
对于我的模型,我使用的是 roberta 变压器模型和 Huggingface 变压器库中的 Trainer。
我计算了两个损失:lloss
是交叉熵损失,dloss
计算层次结构层之间的损失。
总损失是 lloss 和 dloss 之和。 (基于此)
但是,当调用total_loss.backwards()
时,我收到错误消息:
RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed
知道为什么会这样吗?我可以强制它只向后调用一次吗?这是损失计算部分:
dloss = calculate_dloss(prediction, labels, 3)
lloss = calculate_lloss(predeiction, labels, 3)
total_loss = lloss + dloss
total_loss.backward()
def calculate_lloss(predictions, true_labels, total_level):
'''Calculates the layer loss.
'''
loss_fct = nn.CrossEntropyLoss()
lloss = 0
for l in range(total_level):
lloss += loss_fct(predictions[l], true_labels[l])
return self.alpha * lloss
def calculate_dloss(predictions, true_labels, total_level):
'''Calculate the dependence loss.
'''
dloss = 0
for l in range(1, total_level):
current_lvl_pred = torch.argmax(nn.Softmax(dim=1)(predictions[l]), dim=1)
prev_lvl_pred = torch.argmax(nn.Softmax(dim=1)(predictions[l-1]), dim=1)
D_l = self.check_hierarchy(current_lvl_pred, prev_lvl_pred, l) #just a boolean tensor
l_prev = torch.where(prev_lvl_pred == true_labels[l-1], torch.FloatTensor([0]).to(self.device), torch.FloatTensor([1]).to(self.device))
l_curr = torch.where(current_lvl_pred == true_labels[l], torch.FloatTensor([0]).to(self.device), torch.FloatTensor([1]).to(self.device))
dloss += torch.sum(torch.pow(self.p_loss, D_l*l_prev)*torch.pow(self.p_loss, D_l*l_curr) - 1)
return self.beta * dloss
回复
我来回复-
FlyingTeller 评论
损失是两个单独损失的总和并没有错,这是改编自文档的一个小原理证明:
import torch import numpy from sklearn.datasets import make_blobs class Feedforward(torch.nn.Module): def __init__(self, input_size, hidden_size): super(Feedforward, self).__init__() self.input_size = input_size self.hidden_size = hidden_size self.fc1 = torch.nn.Linear(self.input_size, self.hidden_size) self.relu = torch.nn.ReLU() self.fc2 = torch.nn.Linear(self.hidden_size, 1) self.sigmoid = torch.nn.Sigmoid() def forward(self, x): hidden = self.fc1(x) relu = self.relu(hidden) output = self.fc2(relu) output = self.sigmoid(output) return output def blob_label(y, label, loc): # assign labels target = numpy.copy(y) for l in loc: target[y == l] = label return target x_train, y_train = make_blobs(n_samples=40, n_features=2, cluster_std=1.5, shuffle=True) x_train = torch.FloatTensor(x_train) y_train = torch.FloatTensor(blob_label(y_train, 0, [0])) y_train = torch.FloatTensor(blob_label(y_train, 1, [1,2,3])) x_test, y_test = make_blobs(n_samples=10, n_features=2, cluster_std=1.5, shuffle=True) x_test = torch.FloatTensor(x_test) y_test = torch.FloatTensor(blob_label(y_test, 0, [0])) y_test = torch.FloatTensor(blob_label(y_test, 1, [1,2,3])) model = Feedforward(2, 10) criterion = torch.nn.BCELoss() optimizer = torch.optim.SGD(model.parameters(), lr = 0.01) model.eval() y_pred = model(x_test) before_train = criterion(y_pred.squeeze(), y_test) print('Test loss before training' , before_train.item()) model.train() epoch = 20 for epoch in range(epoch): optimizer.zero_grad() # Forward pass y_pred = model(x_train) # Compute Loss lossCE= criterion(y_pred.squeeze(), y_train) lossSQD = (y_pred.squeeze()-y_train).pow(2).mean() loss=lossCE+lossSQD print('Epoch {}: train loss: {}'.format(epoch, loss.item())) # Backward pass loss.backward() optimizer.step()
必须有第二次您直接或间接调用
backward
在某个变量上,然后遍历您的图表。在这里要求完整代码有点过分,只有您可以检查或至少将其简化为最小示例(这样做时,您可能已经发现问题)。除此之外,我会开始检查:- 它是否已经发生在第一次训练迭代中?如果不是:您是否在没有分离的情况下重复使用第二次迭代的任何计算结果?
- 当你单独对损失进行反向处理时, lloss.backward() 然后是 dloss.backward() (这与在累积梯度时首先将它们相加具有相同的效果):会发生什么?这将让您跟踪发生错误的两个损失中的哪一个。
2年前 -
D. ACAR 评论
在backward() 你的comp之后。 graph 被释放,因此第二次向后您需要通过再次提供输入来创建新图。如果您想在向后(出于某种原因)之后重复相同的图,则需要将向后的 retain_graph 标志指定为 True。请参阅retain_graphhere。
P.S。As the summation of Tensors is automatically differentiable, summing the losses would not cause any issue in the backward.
2年前