学习停滞后对已保存模型的训练?
pytorch 205
原文标题 :Training on a saved model after learning has stagnated?
我正在使用 PyTorch 训练一个单层神经网络,并在验证损失减少后保存模型。网络完成训练后,我加载保存的模型并通过它(而不是上一个 epoch 的模型)传递我的测试集特征,看看它的表现如何。然而,通常情况下,验证损失将在大约 150 个 epoch 后停止减少,我担心网络过度拟合数据。如果验证损失在某些迭代次数(例如 5 个 epoch 之后)没有减少,然后在该保存的模型上进行训练,那么在训练期间加载保存的模型会更好吗?
此外,对于如何避免验证损失停止减少的情况,是否有任何建议?我有一些模型,即使在 500 个 epoch 之后验证损失继续减少,而其他模型在 100 个 epoch 之后它停止减少。到目前为止,这是我的代码:
class NeuralNetwork(nn.Module):
def __init__(self, input_dim, output_dim, nodes):
super(NeuralNetwork, self).__init__()
self.linear1 = nn.Linear(input_dim, nodes)
self.tanh = nn.Tanh()
self.linear2 = nn.Linear(nodes, output_dim)
def forward(self, x):
output = self.linear1(x)
output = self.tanh(output)
output = self.linear2(output)
return output
epochs = 500 # (start small for now)
learning_rate = 0.01
w_decay = 0.1
momentum = 0.9
input_dim = 4
output_dim = 1
nodes = 8
model = NeuralNetwork(input_dim, output_dim, nodes)
criterion = nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate, momentum=momentum, weight_decay=w_decay)
scheduler = ReduceLROnPlateau(optimizer, 'min', patience=5)
losses = []
val_losses = []
min_validation_loss = np.inf
means = [] # we want to store the mean and standard deviation for the test set later
stdevs = []
torch.save({
'epoch': 0,
'model_state_dict': model.state_dict(),
'optimizer_state_dict': optimizer.state_dict(),
'training_loss': 0.0,
'validation_loss': 0.0,
'means': [],
'stdevs': [],
}, new_model_path)
new_model_saved = True
for epoch in range(epochs):
curr_loss = 0.0
validation_loss = 0.0
if new_model_saved:
checkpoint = torch.load(new_model_path)
model.load_state_dict(checkpoint['model_state_dict'])
optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
means = checkpoint['means']
stdevs = checkpoint['stdevs']
new_model_saved = False
model.train()
for i, batch in enumerate(train_dataloader):
x, y = batch
x, new_mean, new_std = normalize_data(x, means, stdevs)
means = new_mean
stdevs = new_std
optimizer.zero_grad()
predicted_outputs = model(x)
loss = criterion(torch.squeeze(predicted_outputs), y)
loss.backward()
optimizer.step()
curr_loss += loss.item()
model.eval()
for x_val, y_val in val_dataloader:
x_val, val_means, val_std = normalize_data(x_val, means, stdevs)
predicted_y = model(x_val)
loss = criterion(torch.squeeze(predicted_y), y_val)
validation_loss += loss.item()
curr_lr = optimizer.param_groups[0]['lr']
if epoch % 10 == 0:
print(f'Epoch {epoch} \t\t Training Loss: {curr_loss/len(train_dataloader)} \t\t Validation Loss: {validation_loss/len(val_dataloader)} \t\t Learning rate: {curr_lr}')
if min_validation_loss > validation_loss:
print(f' For epoch {epoch}, validation loss decreased ({min_validation_loss:.6f}--->{validation_loss:.6f}) \t learning rate: {curr_lr} \t saving the model')
min_validation_loss = validation_loss
torch.save({
'epoch': epoch,
'model_state_dict': model.state_dict(),
'optimizer_state_dict': optimizer.state_dict(),
'training_loss': curr_loss/len(train_dataloader),
'validation_loss': validation_loss/len(val_dataloader),
'means': means,
'stdevs': stdevs
}, new_model_path)
new_model_saved = True
losses.append(curr_loss/len(train_dataloader))
val_losses.append(validation_loss/len(val_dataloader))
scheduler.step(curr_loss/len(train_dataloader))
回复
我来回复-
Tomer Geva 评论
验证损失增加而训练损失减少的现象称为过拟合。过拟合是训练模型时的一个问题,应该避免。请在此处阅读有关此主题的更多信息。过拟合可能发生在任意数量的 epoch 和 id 依赖于很多变量(学习率,数据库端,数据库多样性等)。根据经验,在“枢轴点”测试您的模型,即。验证损失开始增加的确切位置(并且训练继续减少)。这意味着我的建议是在每次迭代后保存模型,其中验证损失会减少。如果在任何 X 轮数之后它继续增加,这可能意味着您达到了损失的“深度”最小值,继续训练将无益(同样,这有一些例外,但对于这个级别的讨论来说就足够了)我鼓励你阅读和了解更多关于这个主题的内容,它非常有趣并且具有重要意义。
2年前