RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling `cublasCreate(handle)`

心中带点小风骚 • 2023年7月11日上午9:13 • Python • 阅读 602

问题背景

今天训练BERT时遇到了这个bug：

RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling `cublasCreate(handle)`

于是在网上搜罗了一番，发现基本都是在说batch size开的太大，但调小batch size对我而言并不能解决问题。

解决过程

既然是比较罕见的CUDA报错，为什么不尝试先在CPU上跑跑看看呢？

于是我将 device = 'cuda' if torch.cuda.is_available() else 'cpu' 直接改成了 device = 'cpu'，再运行代码时遇到了如下的bug（只截取了最后几行）：

  File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/sparse.py", line 158, in forward
    return F.embedding(
  File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/functional.py", line 2199, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
IndexError: index out of range in self

容易看出这是因为embedding层无法对输入的某些索引进行lookup，即词表大小设置的有问题，于是又回过头去翻翻自己写的BERT代码：

class BERT(nn.Module):
    def __init__(self, vocab):
        super().__init__()
        self.vocab = vocab
        self.config = BertConfig()
        self.model = BertModel(config=self.config)
        self.config.vocab_size = len(vocab)

很显然，BERT模型实例化了之后才修改的词表大小，这样做毫无意义，对调最后两行后成功解决！

文章出处登录后可见！

已经登录？立即刷新

RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling `cublasCreate(handle)`

问题背景

解决过程

相关推荐