学习停滞后对已保存模型的训练?

社会演员多 pytorch 205

原文标题Training on a saved model after learning has stagnated?

我正在使用 PyTorch 训练一个单层神经网络,并在验证损失减少后保存模型。网络完成训练后,我加载保存的模型并通过它(而不是上一个 epoch 的模型)传递我的测试集特征,看看它的表现如何。然而,通常情况下,验证损失将在大约 150 个 epoch 后停止减少,我担心网络过度拟合数据。如果验证损失在某些迭代次数(例如 5 个 epoch 之后)没有减少,然后在该保存的模型上进行训练,那么在训练期间加载保存的模型会更好吗?

此外,对于如何避免验证损失停止减少的情况,是否有任何建议?我有一些模型,即使在 500 个 epoch 之后验证损失继续减少,而其他模型在 100 个 epoch 之后它停止减少。到目前为止,这是我的代码:

class NeuralNetwork(nn.Module):
    def __init__(self, input_dim, output_dim, nodes):
        super(NeuralNetwork, self).__init__()
        self.linear1 = nn.Linear(input_dim, nodes)
        self.tanh = nn.Tanh()
        self.linear2 = nn.Linear(nodes, output_dim)

    def forward(self, x):
        output = self.linear1(x)
        output = self.tanh(output)
        output = self.linear2(output)
        return output

epochs = 500 # (start small for now)
learning_rate = 0.01
w_decay = 0.1
momentum = 0.9
input_dim = 4
output_dim = 1
nodes = 8
model = NeuralNetwork(input_dim, output_dim, nodes)

criterion = nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate, momentum=momentum, weight_decay=w_decay) 
scheduler = ReduceLROnPlateau(optimizer, 'min', patience=5)

losses = []
val_losses = []
min_validation_loss = np.inf
means = [] # we want to store the mean and standard deviation for the test set later
stdevs = []
torch.save({
    'epoch': 0,
    'model_state_dict': model.state_dict(),
    'optimizer_state_dict': optimizer.state_dict(),
    'training_loss': 0.0,
    'validation_loss': 0.0,
    'means': [],
    'stdevs': [],
    }, new_model_path)
new_model_saved = True

for epoch in range(epochs):
    curr_loss = 0.0
    validation_loss = 0.0

    if new_model_saved:
        checkpoint = torch.load(new_model_path)
        model.load_state_dict(checkpoint['model_state_dict'])
        optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
        means = checkpoint['means']
        stdevs = checkpoint['stdevs']
        new_model_saved = False

    model.train()
    for i, batch in enumerate(train_dataloader):
        x, y = batch
        x, new_mean, new_std = normalize_data(x, means, stdevs)
        means = new_mean
        stdevs = new_std
        optimizer.zero_grad()
        predicted_outputs = model(x)
        loss = criterion(torch.squeeze(predicted_outputs), y)
        loss.backward()
        optimizer.step()
        curr_loss += loss.item()

    model.eval()
    for x_val, y_val in val_dataloader:
        x_val, val_means, val_std = normalize_data(x_val, means, stdevs)
        predicted_y = model(x_val)
        loss = criterion(torch.squeeze(predicted_y), y_val)
        validation_loss += loss.item()

    curr_lr = optimizer.param_groups[0]['lr']
    if epoch % 10 == 0:
        print(f'Epoch {epoch} \t\t Training Loss: {curr_loss/len(train_dataloader)} \t\t Validation Loss: {validation_loss/len(val_dataloader)} \t\t Learning rate: {curr_lr}')
    if min_validation_loss > validation_loss:
        print(f'     For epoch {epoch}, validation loss decreased ({min_validation_loss:.6f}--->{validation_loss:.6f}) \t learning rate: {curr_lr} \t saving the model')
        min_validation_loss = validation_loss
        torch.save({
            'epoch': epoch,
            'model_state_dict': model.state_dict(),
            'optimizer_state_dict': optimizer.state_dict(),
            'training_loss': curr_loss/len(train_dataloader),
            'validation_loss': validation_loss/len(val_dataloader),
            'means': means,
            'stdevs': stdevs
            }, new_model_path)
        new_model_saved = True

    losses.append(curr_loss/len(train_dataloader))
    val_losses.append(validation_loss/len(val_dataloader))
    scheduler.step(curr_loss/len(train_dataloader))

原文链接:https://stackoverflow.com//questions/71433545/training-on-a-saved-model-after-learning-has-stagnated

回复

我来回复
  • Tomer Geva的头像
    Tomer Geva 评论

    验证损失增加而训练损失减少的现象称为过拟合。过拟合是训练模型时的一个问题,应该避免。请在此处阅读有关此主题的更多信息。过拟合可能发生在任意数量的 epoch 和 id 依赖于很多变量(学习率,数据库端,数据库多样性等)。根据经验,在“枢轴点”测试您的模型,即。验证损失开始增加的确切位置(并且训练继续减少)。这意味着我的建议是在每次迭代后保存模型,其中验证损失会减少。如果在任何 X 轮数之后它继续增加,这可能意味着您达到了损失的“深度”最小值,继续训练将无益(同样,这有一些例外,但对于这个级别的讨论来说就足够了)我鼓励你阅读和了解更多关于这个主题的内容,它非常有趣并且具有重要意义。

    2年前 0条评论