PyTorch 自动编码器仅生成一张图像

扎眼的阳光 pytorch 194

原文标题PyTorch autoencoder produces only one image

我是 PyTorch 的新手,在使用 MNIST 数据集的基本自动编码器时遇到了问题。自动编码器是一种神经网络,它进行训练以使输出通过中间较窄的层恢复输入,从而学习高维输入空间的低维表示。

错误是我所有训练过的示例都输出相同的输出。我不确定错误在哪里;我为此修改了一个在线教程,我相信所有不同的图像都不应该输出到一个输出中。谁能帮我找出我没有发现的任何简单错误或错误设置?

这是一个重现我的问题的代码片段。

# AE one-block
import time
import numpy as np
import matplotlib.pyplot as plt
import matplotlib
import torch
import torchvision
from torch import nn
from torchvision import transforms
from torchvision.transforms import ToTensor
import torch.nn.functional as F
from torch.utils.tensorboard import SummaryWriter
from torch.optim.lr_scheduler import ExponentialLR
import torchvision.datasets as datasets
from torch.utils.data import DataLoader
from torchvision import datasets


class AE(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.flatten = nn.Flatten()
        
        self.encoder = torch.nn.Sequential(
            torch.nn.Linear(28 * 28, 128),
            torch.nn.ReLU(),
            torch.nn.Linear(128, 64),
            torch.nn.ReLU(),
            torch.nn.Linear(64, 36),
            torch.nn.ReLU(),
            torch.nn.Linear(36, 18),
            torch.nn.ReLU(),
            torch.nn.Linear(18, 9)
        )
        
        self.decoder = torch.nn.Sequential(
            torch.nn.Linear(9, 18),
            torch.nn.ReLU(),
            torch.nn.Linear(18, 36),
            torch.nn.ReLU(),
            torch.nn.Linear(36, 64),
            torch.nn.ReLU(),
            torch.nn.Linear(64, 128),
            torch.nn.ReLU(),
            torch.nn.Linear(128, 28 * 28),
            torch.nn.Sigmoid()
        )
    
    def forward(self, x):
        x = self.flatten(x)
        encoded = self.encoder(x)
        decoded = self.decoder(encoded)
        return decoded
    
    
# Model initialization
model = AE()
loss_function = torch.nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr = 1e-1, weight_decay = 1e-8)
tensor_transform = transforms.ToTensor()
dataset = datasets.MNIST(root = "./data",
                         train = True,
                         download = True,
                         transform = tensor_transform)
loader = torch.utils.data.DataLoader(dataset = dataset,
                                     batch_size = 32,
                                     shuffle = True)


# Train
epochs = 5
outputs = []
losses = []
for epoch in range(epochs):
    tic = time.monotonic()
    print(f'Epoch = {epoch}')
    for (image, _) in loader:
      image = image.reshape(-1, 28*28)
      reconstructed = model(image)
      loss = loss_function(reconstructed, image)
      optimizer.zero_grad()
      loss.backward()
      optimizer.step()
      losses.append(loss)
    outputs.append((epochs, image, reconstructed))
    toc = time.monotonic()
    print(f'Time taken = {round(toc - tic, 2)}')


# Calculate difference between image outputs
im_batches = [image_batch for (image_batch, _) in loader]
only_image = model(im_batches[0]).detach().numpy()[0]
diff_total = 0
for i in range(len(im_batches)):
    im_out = model(im_batches[i]).detach().numpy()
    diff = np.linalg.norm(im_out - only_image)
print(f'Difference between outputs = {diff_total}')


# Show image outputs
im_out1 = model(im_batches[0]).detach().numpy()
im_out2 = model(im_batches[1]).detach().numpy()
for i in range(3):
    plt.imshow(im_out1[i].reshape(28, 28))
    plt.show()
for i in range(3):
    plt.imshow(im_out2[i].reshape(28, 28))
    plt.show()

我的 Python 计算打印出“输出之间的差异”为零,这表明前 10 个批次的所有输出图像都具有相同的输出。通过直接查看前几张图像进行的视觉检查还表明,输出看起来像是所有 MNIST 数字图像的奇怪融合。

原文链接:https://stackoverflow.com//questions/71905964/pytorch-autoencoder-produces-only-one-image

回复

我来回复
  • draw的头像
    draw 评论

    将优化器的学习率从1e-1(真的很大)降低到1e-4(这有点标准)并将 epoch 的数量从 5 增加到 10,输出将不再相同。打印“输出差”变量diff_total时,变量diff_total不变,循环变量diff被计算,与diff_total没有交互作用。所以即使epochs == 0和模型输出是随机的,“输出之间的差异”也将等于0。而且内存消耗最好做losses.append(loss.item())

    2年前 0条评论