拥抱脸序列分类解冻层

扎眼的阳光 pytorch 224

原文标题huggingface sequence classification unfreezing layers

我正在使用 longformer 进行序列分类 – 二元问题

我已下载所需文件

# load model and tokenizer and define length of the text sequence
model = LongformerForSequenceClassification.from_pretrained('allenai/longformer-base-4096',
                                                           gradient_checkpointing=False,
                                                           attention_window = 512)
tokenizer = LongformerTokenizerFast.from_pretrained('allenai/longformer-base-4096', max_length = 1024)

然后如图所示here我运行了下面的代码

for name, param in model.named_parameters():
     print(name, param.requires_grad)


longformer.embeddings.word_embeddings.weight True
longformer.embeddings.position_embeddings.weight True
longformer.embeddings.token_type_embeddings.weight True
longformer.embeddings.LayerNorm.weight True
longformer.embeddings.LayerNorm.bias True
longformer.encoder.layer.0.attention.self.query.weight True
longformer.encoder.layer.0.attention.self.query.bias True
longformer.encoder.layer.0.attention.self.key.weight True
longformer.encoder.layer.0.attention.self.key.bias True
longformer.encoder.layer.0.attention.self.value.weight True
longformer.encoder.layer.0.attention.self.value.bias True
longformer.encoder.layer.0.attention.self.query_global.weight True
longformer.encoder.layer.0.attention.self.query_global.bias True
longformer.encoder.layer.0.attention.self.key_global.weight True
longformer.encoder.layer.0.attention.self.key_global.bias True
longformer.encoder.layer.0.attention.self.value_global.weight True
longformer.encoder.layer.0.attention.self.value_global.bias True
longformer.encoder.layer.0.attention.output.dense.weight True
longformer.encoder.layer.0.attention.output.dense.bias True
longformer.encoder.layer.0.attention.output.LayerNorm.weight True
longformer.encoder.layer.0.attention.output.LayerNorm.bias True
longformer.encoder.layer.0.intermediate.dense.weight True
longformer.encoder.layer.0.intermediate.dense.bias True
longformer.encoder.layer.0.output.dense.weight True
longformer.encoder.layer.0.output.dense.bias True
longformer.encoder.layer.0.output.LayerNorm.weight True
longformer.encoder.layer.0.output.LayerNorm.bias True
longformer.encoder.layer.1.attention.self.query.weight True
longformer.encoder.layer.1.attention.self.query.bias True
longformer.encoder.layer.1.attention.self.key.weight True
longformer.encoder.layer.1.attention.self.key.bias True
longformer.encoder.layer.1.attention.self.value.weight True
longformer.encoder.layer.1.attention.self.value.bias True
longformer.encoder.layer.1.attention.self.query_global.weight True
longformer.encoder.layer.1.attention.self.query_global.bias True
longformer.encoder.layer.1.attention.self.key_global.weight True
longformer.encoder.layer.1.attention.self.key_global.bias True
longformer.encoder.layer.1.attention.self.value_global.weight True
longformer.encoder.layer.1.attention.self.value_global.bias True
longformer.encoder.layer.1.attention.output.dense.weight True
longformer.encoder.layer.1.attention.output.dense.bias True
longformer.encoder.layer.1.attention.output.LayerNorm.weight True
longformer.encoder.layer.1.attention.output.LayerNorm.bias True
longformer.encoder.layer.1.intermediate.dense.weight True
longformer.encoder.layer.1.intermediate.dense.bias True
longformer.encoder.layer.1.output.dense.weight True
longformer.encoder.layer.1.output.dense.bias True
longformer.encoder.layer.1.output.LayerNorm.weight True
longformer.encoder.layer.1.output.LayerNorm.bias True
longformer.encoder.layer.2.attention.self.query.weight True
longformer.encoder.layer.2.attention.self.query.bias True
longformer.encoder.layer.2.attention.self.key.weight True
longformer.encoder.layer.2.attention.self.key.bias True
longformer.encoder.layer.2.attention.self.value.weight True
longformer.encoder.layer.2.attention.self.value.bias True
longformer.encoder.layer.2.attention.self.query_global.weight True
longformer.encoder.layer.2.attention.self.query_global.bias True
longformer.encoder.layer.2.attention.self.key_global.weight True
longformer.encoder.layer.2.attention.self.key_global.bias True
longformer.encoder.layer.2.attention.self.value_global.weight True
longformer.encoder.layer.2.attention.self.value_global.bias True
longformer.encoder.layer.2.attention.output.dense.weight True
longformer.encoder.layer.2.attention.output.dense.bias True
longformer.encoder.layer.2.attention.output.LayerNorm.weight True
longformer.encoder.layer.2.attention.output.LayerNorm.bias True
longformer.encoder.layer.2.intermediate.dense.weight True
longformer.encoder.layer.2.intermediate.dense.bias True
longformer.encoder.layer.2.output.dense.weight True
longformer.encoder.layer.2.output.dense.bias True
longformer.encoder.layer.2.output.LayerNorm.weight True
longformer.encoder.layer.2.output.LayerNorm.bias True
longformer.encoder.layer.3.attention.self.query.weight True
longformer.encoder.layer.3.attention.self.query.bias True
longformer.encoder.layer.3.attention.self.key.weight True
longformer.encoder.layer.3.attention.self.key.bias True
longformer.encoder.layer.3.attention.self.value.weight True
longformer.encoder.layer.3.attention.self.value.bias True
longformer.encoder.layer.3.attention.self.query_global.weight True
longformer.encoder.layer.3.attention.self.query_global.bias True
longformer.encoder.layer.3.attention.self.key_global.weight True
longformer.encoder.layer.3.attention.self.key_global.bias True
longformer.encoder.layer.3.attention.self.value_global.weight True
longformer.encoder.layer.3.attention.self.value_global.bias True
longformer.encoder.layer.3.attention.output.dense.weight True
longformer.encoder.layer.3.attention.output.dense.bias True
longformer.encoder.layer.3.attention.output.LayerNorm.weight True
longformer.encoder.layer.3.attention.output.LayerNorm.bias True
longformer.encoder.layer.3.intermediate.dense.weight True
longformer.encoder.layer.3.intermediate.dense.bias True
longformer.encoder.layer.3.output.dense.weight True
longformer.encoder.layer.3.output.dense.bias True
longformer.encoder.layer.3.output.LayerNorm.weight True
longformer.encoder.layer.3.output.LayerNorm.bias True
longformer.encoder.layer.4.attention.self.query.weight True
longformer.encoder.layer.4.attention.self.query.bias True
longformer.encoder.layer.4.attention.self.key.weight True
longformer.encoder.layer.4.attention.self.key.bias True
longformer.encoder.layer.4.attention.self.value.weight True
longformer.encoder.layer.4.attention.self.value.bias True
longformer.encoder.layer.4.attention.self.query_global.weight True
longformer.encoder.layer.4.attention.self.query_global.bias True
longformer.encoder.layer.4.attention.self.key_global.weight True
longformer.encoder.layer.4.attention.self.key_global.bias True
longformer.encoder.layer.4.attention.self.value_global.weight True
longformer.encoder.layer.4.attention.self.value_global.bias True
longformer.encoder.layer.4.attention.output.dense.weight True
longformer.encoder.layer.4.attention.output.dense.bias True
longformer.encoder.layer.4.attention.output.LayerNorm.weight True
longformer.encoder.layer.4.attention.output.LayerNorm.bias True
longformer.encoder.layer.4.intermediate.dense.weight True
longformer.encoder.layer.4.intermediate.dense.bias True
longformer.encoder.layer.4.output.dense.weight True
longformer.encoder.layer.4.output.dense.bias True
longformer.encoder.layer.4.output.LayerNorm.weight True
longformer.encoder.layer.4.output.LayerNorm.bias True
longformer.encoder.layer.5.attention.self.query.weight True
longformer.encoder.layer.5.attention.self.query.bias True
longformer.encoder.layer.5.attention.self.key.weight True
longformer.encoder.layer.5.attention.self.key.bias True
longformer.encoder.layer.5.attention.self.value.weight True
longformer.encoder.layer.5.attention.self.value.bias True
longformer.encoder.layer.5.attention.self.query_global.weight True
longformer.encoder.layer.5.attention.self.query_global.bias True
longformer.encoder.layer.5.attention.self.key_global.weight True
longformer.encoder.layer.5.attention.self.key_global.bias True
longformer.encoder.layer.5.attention.self.value_global.weight True
longformer.encoder.layer.5.attention.self.value_global.bias True
longformer.encoder.layer.5.attention.output.dense.weight True
longformer.encoder.layer.5.attention.output.dense.bias True
longformer.encoder.layer.5.attention.output.LayerNorm.weight True
longformer.encoder.layer.5.attention.output.LayerNorm.bias True
longformer.encoder.layer.5.intermediate.dense.weight True
longformer.encoder.layer.5.intermediate.dense.bias True
longformer.encoder.layer.5.output.dense.weight True
longformer.encoder.layer.5.output.dense.bias True
longformer.encoder.layer.5.output.LayerNorm.weight True
longformer.encoder.layer.5.output.LayerNorm.bias True
longformer.encoder.layer.6.attention.self.query.weight True
longformer.encoder.layer.6.attention.self.query.bias True
longformer.encoder.layer.6.attention.self.key.weight True
longformer.encoder.layer.6.attention.self.key.bias True
longformer.encoder.layer.6.attention.self.value.weight True
longformer.encoder.layer.6.attention.self.value.bias True
longformer.encoder.layer.6.attention.self.query_global.weight True
longformer.encoder.layer.6.attention.self.query_global.bias True
longformer.encoder.layer.6.attention.self.key_global.weight True
longformer.encoder.layer.6.attention.self.key_global.bias True
longformer.encoder.layer.6.attention.self.value_global.weight True
longformer.encoder.layer.6.attention.self.value_global.bias True
longformer.encoder.layer.6.attention.output.dense.weight True
longformer.encoder.layer.6.attention.output.dense.bias True
longformer.encoder.layer.6.attention.output.LayerNorm.weight True
longformer.encoder.layer.6.attention.output.LayerNorm.bias True
longformer.encoder.layer.6.intermediate.dense.weight True
longformer.encoder.layer.6.intermediate.dense.bias True
longformer.encoder.layer.6.output.dense.weight True
longformer.encoder.layer.6.output.dense.bias True
longformer.encoder.layer.6.output.LayerNorm.weight True
longformer.encoder.layer.6.output.LayerNorm.bias True
longformer.encoder.layer.7.attention.self.query.weight True
longformer.encoder.layer.7.attention.self.query.bias True
longformer.encoder.layer.7.attention.self.key.weight True
longformer.encoder.layer.7.attention.self.key.bias True
longformer.encoder.layer.7.attention.self.value.weight True
longformer.encoder.layer.7.attention.self.value.bias True
longformer.encoder.layer.7.attention.self.query_global.weight True
longformer.encoder.layer.7.attention.self.query_global.bias True
longformer.encoder.layer.7.attention.self.key_global.weight True
longformer.encoder.layer.7.attention.self.key_global.bias True
longformer.encoder.layer.7.attention.self.value_global.weight True
longformer.encoder.layer.7.attention.self.value_global.bias True
longformer.encoder.layer.7.attention.output.dense.weight True
longformer.encoder.layer.7.attention.output.dense.bias True
longformer.encoder.layer.7.attention.output.LayerNorm.weight True
longformer.encoder.layer.7.attention.output.LayerNorm.bias True
longformer.encoder.layer.7.intermediate.dense.weight True
longformer.encoder.layer.7.intermediate.dense.bias True
longformer.encoder.layer.7.output.dense.weight True
longformer.encoder.layer.7.output.dense.bias True
longformer.encoder.layer.7.output.LayerNorm.weight True
longformer.encoder.layer.7.output.LayerNorm.bias True
longformer.encoder.layer.8.attention.self.query.weight True
longformer.encoder.layer.8.attention.self.query.bias True
longformer.encoder.layer.8.attention.self.key.weight True
longformer.encoder.layer.8.attention.self.key.bias True
longformer.encoder.layer.8.attention.self.value.weight True
longformer.encoder.layer.8.attention.self.value.bias True
longformer.encoder.layer.8.attention.self.query_global.weight True
longformer.encoder.layer.8.attention.self.query_global.bias True
longformer.encoder.layer.8.attention.self.key_global.weight True
longformer.encoder.layer.8.attention.self.key_global.bias True
longformer.encoder.layer.8.attention.self.value_global.weight True
longformer.encoder.layer.8.attention.self.value_global.bias True
longformer.encoder.layer.8.attention.output.dense.weight True
longformer.encoder.layer.8.attention.output.dense.bias True
longformer.encoder.layer.8.attention.output.LayerNorm.weight True
longformer.encoder.layer.8.attention.output.LayerNorm.bias True
longformer.encoder.layer.8.intermediate.dense.weight True
longformer.encoder.layer.8.intermediate.dense.bias True
longformer.encoder.layer.8.output.dense.weight True
longformer.encoder.layer.8.output.dense.bias True
longformer.encoder.layer.8.output.LayerNorm.weight True
longformer.encoder.layer.8.output.LayerNorm.bias True
longformer.encoder.layer.9.attention.self.query.weight True
longformer.encoder.layer.9.attention.self.query.bias True
longformer.encoder.layer.9.attention.self.key.weight True
longformer.encoder.layer.9.attention.self.key.bias True
longformer.encoder.layer.9.attention.self.value.weight True
longformer.encoder.layer.9.attention.self.value.bias True
longformer.encoder.layer.9.attention.self.query_global.weight True
longformer.encoder.layer.9.attention.self.query_global.bias True
longformer.encoder.layer.9.attention.self.key_global.weight True
longformer.encoder.layer.9.attention.self.key_global.bias True
longformer.encoder.layer.9.attention.self.value_global.weight True
longformer.encoder.layer.9.attention.self.value_global.bias True
longformer.encoder.layer.9.attention.output.dense.weight True
longformer.encoder.layer.9.attention.output.dense.bias True
longformer.encoder.layer.9.attention.output.LayerNorm.weight True
longformer.encoder.layer.9.attention.output.LayerNorm.bias True
longformer.encoder.layer.9.intermediate.dense.weight True
longformer.encoder.layer.9.intermediate.dense.bias True
longformer.encoder.layer.9.output.dense.weight True
longformer.encoder.layer.9.output.dense.bias True
longformer.encoder.layer.9.output.LayerNorm.weight True
longformer.encoder.layer.9.output.LayerNorm.bias True
longformer.encoder.layer.10.attention.self.query.weight True
longformer.encoder.layer.10.attention.self.query.bias True
longformer.encoder.layer.10.attention.self.key.weight True
longformer.encoder.layer.10.attention.self.key.bias True
longformer.encoder.layer.10.attention.self.value.weight True
longformer.encoder.layer.10.attention.self.value.bias True
longformer.encoder.layer.10.attention.self.query_global.weight True
longformer.encoder.layer.10.attention.self.query_global.bias True
longformer.encoder.layer.10.attention.self.key_global.weight True
longformer.encoder.layer.10.attention.self.key_global.bias True
longformer.encoder.layer.10.attention.self.value_global.weight True
longformer.encoder.layer.10.attention.self.value_global.bias True
longformer.encoder.layer.10.attention.output.dense.weight True
longformer.encoder.layer.10.attention.output.dense.bias True
longformer.encoder.layer.10.attention.output.LayerNorm.weight True
longformer.encoder.layer.10.attention.output.LayerNorm.bias True
longformer.encoder.layer.10.intermediate.dense.weight True
longformer.encoder.layer.10.intermediate.dense.bias True
longformer.encoder.layer.10.output.dense.weight True
longformer.encoder.layer.10.output.dense.bias True
longformer.encoder.layer.10.output.LayerNorm.weight True
longformer.encoder.layer.10.output.LayerNorm.bias True
longformer.encoder.layer.11.attention.self.query.weight True
longformer.encoder.layer.11.attention.self.query.bias True
longformer.encoder.layer.11.attention.self.key.weight True
longformer.encoder.layer.11.attention.self.key.bias True
longformer.encoder.layer.11.attention.self.value.weight True
longformer.encoder.layer.11.attention.self.value.bias True
longformer.encoder.layer.11.attention.self.query_global.weight True
longformer.encoder.layer.11.attention.self.query_global.bias True
longformer.encoder.layer.11.attention.self.key_global.weight True
longformer.encoder.layer.11.attention.self.key_global.bias True
longformer.encoder.layer.11.attention.self.value_global.weight True
longformer.encoder.layer.11.attention.self.value_global.bias True
longformer.encoder.layer.11.attention.output.dense.weight True
longformer.encoder.layer.11.attention.output.dense.bias True
longformer.encoder.layer.11.attention.output.LayerNorm.weight True
longformer.encoder.layer.11.attention.output.LayerNorm.bias True
longformer.encoder.layer.11.intermediate.dense.weight True
longformer.encoder.layer.11.intermediate.dense.bias True
longformer.encoder.layer.11.output.dense.weight True
longformer.encoder.layer.11.output.dense.bias True
longformer.encoder.layer.11.output.LayerNorm.weight True
longformer.encoder.layer.11.output.LayerNorm.bias True
classifier.dense.weight True
classifier.dense.bias True
classifier.out_proj.weight True
classifier.out_proj.bias True

我的问题

  1. 为什么所有层 param.requires_grad 都是 True ?至少对于分类器不应该是 False 。层?我们不是在训练他们吗?
  2. param.requires_grad == True 是否意味着特定层被冻结?我对 requires_grad 的措辞感到困惑。是冷冻的意思吗?
  3. 如果我想训练一些以前的层,如此处所示,我应该使用下面的代码吗?

for name, param in model.named_parameters():

if name.startswith("..."): # choose whatever you like here

param.requires_grad = False

  1. 考虑到训练需要很多时间,是否有关于我应该训练的层的具体建议?首先,我计划训练 –

所有以 longformer.encoder.layer.11.和开头的图层

`classifier.dense.weight` 
`classifier.dense.bias` 
`classifier.out_proj.weight` 
`classifier.out_proj.bias`
  1. 我是否需要添加任何额外的层,例如 dropout 或者 LongformerForSequenceClassification.from_pretrained 已经处理过?我在上面的输出中没有看到任何丢失层,这就是为什么问这个问题

#——————–更新1

通过使用@joe32140 给出的答案中的以下代码,我怎么知道哪些层被冻结了?我的猜测是除了原始问题中显示的输出中的最后 4 层之外的所有内容都被冻结了。但是有没有更简单的检查方法?

for param in model.base_model.parameters():
    param.requires_grad = False

原文链接:https://stackoverflow.com//questions/71577525/huggingface-sequence-classification-unfreezing-layers

回复

我来回复
  • joe32140的头像
    joe32140 评论
    1. requires_grad==True means that we will compute the gradient of this tensor, so the default setting is we will train/finetune all layers.
    2. 您只能通过冻结编码器来训练输出层
    for param in model.base_model.parameters():
        param.requires_grad = False
    
    1. 是的,dropout 用于拥抱脸输出层的实现。见这里:https://github.com/huggingface/transformers/blob/198c335d219a5eb4d3f124fdd1ce1a9cd9f78a9b/src/transformers/models/longformer/modeling_longformer.py#L1938
    2. 至于更新1,是的,base_model是指不包括输出分类头的层。然而,它实际上是两层而不是四层,其中每一层都有一个权重和一个偏置张量。
    2年前 0条评论