Pytorch DataLoader 没有将数据集分成批次

青葱年少 2年前 pytorch 461

原文标题 ：Pytorch DataLoader is not dividing the dataset into batches

我正在尝试使用以下代码在 DataLoader 中加载训练数据

class Dataset(Dataset):
    def __init__(self, x, y):
        self.x = x
        self.y = y
        
    def __getitem__(self, index):
        x = torch.Tensor(self.x[index])
        y = torch.Tensor(self.y[index])
        return (x, y)

    def __len__(self):
        count = self.x.shape[0]
        return count
    
X_train = np.reshape(X_train,(-1,1,X_train.shape[0],X_train.shape[1]))
y_train = np.reshape(y_train,(-1,1,y_train.shape[0],y_train.shape[1]))
train_dataset = Dataset(X_train, y_train)
train_loader = torch.utils.data.DataLoader(dataset=train_dataset,batch_size=128,shuffle=True)

现在，当我检查 DataLoader 的长度时，我每次都会得到一个数据集。加载器不会将数据集拆分为批次。我在这里做错了什么？

原文链接：https://stackoverflow.com//questions/71661473/pytorch-dataloader-is-not-dividing-the-dataset-into-batches

我来回复

aaossa 评论
该回答已被采纳！

测试您的代码后，如果您删除重塑步骤，它似乎可以完美运行。您正在引入一个新维度，因此 X_train 的新形状是（1，something，something），但您正在使用self.x[index]索引您的项目，因此您始终访问批量维度。在计算数据集的长度时，您会犯同样的错误：始终为 1。

解决方法：不要整形。
```
X_train = np.random.rand(12_000, 1280)
y_train = np.random.rand(12_000, 1)
train_dataset = Dataset(X_train, y_train)
train_loader = torch.utils.data.DataLoader(dataset=train_dataset,batch_size=128,shuffle=True)

for x, y in train_loader:
    print(x.shape)
    print(y.shape)
    break
```
2年前 0条评论

Pytorch DataLoader 没有将数据集分成批次

回复

相关问题