拥抱脸序列分类解冻层
pytorch 224
原文标题 :huggingface sequence classification unfreezing layers
我正在使用 longformer 进行序列分类 – 二元问题
我已下载所需文件
# load model and tokenizer and define length of the text sequence
model = LongformerForSequenceClassification.from_pretrained('allenai/longformer-base-4096',
gradient_checkpointing=False,
attention_window = 512)
tokenizer = LongformerTokenizerFast.from_pretrained('allenai/longformer-base-4096', max_length = 1024)
然后如图所示here我运行了下面的代码
for name, param in model.named_parameters():
print(name, param.requires_grad)
longformer.embeddings.word_embeddings.weight True
longformer.embeddings.position_embeddings.weight True
longformer.embeddings.token_type_embeddings.weight True
longformer.embeddings.LayerNorm.weight True
longformer.embeddings.LayerNorm.bias True
longformer.encoder.layer.0.attention.self.query.weight True
longformer.encoder.layer.0.attention.self.query.bias True
longformer.encoder.layer.0.attention.self.key.weight True
longformer.encoder.layer.0.attention.self.key.bias True
longformer.encoder.layer.0.attention.self.value.weight True
longformer.encoder.layer.0.attention.self.value.bias True
longformer.encoder.layer.0.attention.self.query_global.weight True
longformer.encoder.layer.0.attention.self.query_global.bias True
longformer.encoder.layer.0.attention.self.key_global.weight True
longformer.encoder.layer.0.attention.self.key_global.bias True
longformer.encoder.layer.0.attention.self.value_global.weight True
longformer.encoder.layer.0.attention.self.value_global.bias True
longformer.encoder.layer.0.attention.output.dense.weight True
longformer.encoder.layer.0.attention.output.dense.bias True
longformer.encoder.layer.0.attention.output.LayerNorm.weight True
longformer.encoder.layer.0.attention.output.LayerNorm.bias True
longformer.encoder.layer.0.intermediate.dense.weight True
longformer.encoder.layer.0.intermediate.dense.bias True
longformer.encoder.layer.0.output.dense.weight True
longformer.encoder.layer.0.output.dense.bias True
longformer.encoder.layer.0.output.LayerNorm.weight True
longformer.encoder.layer.0.output.LayerNorm.bias True
longformer.encoder.layer.1.attention.self.query.weight True
longformer.encoder.layer.1.attention.self.query.bias True
longformer.encoder.layer.1.attention.self.key.weight True
longformer.encoder.layer.1.attention.self.key.bias True
longformer.encoder.layer.1.attention.self.value.weight True
longformer.encoder.layer.1.attention.self.value.bias True
longformer.encoder.layer.1.attention.self.query_global.weight True
longformer.encoder.layer.1.attention.self.query_global.bias True
longformer.encoder.layer.1.attention.self.key_global.weight True
longformer.encoder.layer.1.attention.self.key_global.bias True
longformer.encoder.layer.1.attention.self.value_global.weight True
longformer.encoder.layer.1.attention.self.value_global.bias True
longformer.encoder.layer.1.attention.output.dense.weight True
longformer.encoder.layer.1.attention.output.dense.bias True
longformer.encoder.layer.1.attention.output.LayerNorm.weight True
longformer.encoder.layer.1.attention.output.LayerNorm.bias True
longformer.encoder.layer.1.intermediate.dense.weight True
longformer.encoder.layer.1.intermediate.dense.bias True
longformer.encoder.layer.1.output.dense.weight True
longformer.encoder.layer.1.output.dense.bias True
longformer.encoder.layer.1.output.LayerNorm.weight True
longformer.encoder.layer.1.output.LayerNorm.bias True
longformer.encoder.layer.2.attention.self.query.weight True
longformer.encoder.layer.2.attention.self.query.bias True
longformer.encoder.layer.2.attention.self.key.weight True
longformer.encoder.layer.2.attention.self.key.bias True
longformer.encoder.layer.2.attention.self.value.weight True
longformer.encoder.layer.2.attention.self.value.bias True
longformer.encoder.layer.2.attention.self.query_global.weight True
longformer.encoder.layer.2.attention.self.query_global.bias True
longformer.encoder.layer.2.attention.self.key_global.weight True
longformer.encoder.layer.2.attention.self.key_global.bias True
longformer.encoder.layer.2.attention.self.value_global.weight True
longformer.encoder.layer.2.attention.self.value_global.bias True
longformer.encoder.layer.2.attention.output.dense.weight True
longformer.encoder.layer.2.attention.output.dense.bias True
longformer.encoder.layer.2.attention.output.LayerNorm.weight True
longformer.encoder.layer.2.attention.output.LayerNorm.bias True
longformer.encoder.layer.2.intermediate.dense.weight True
longformer.encoder.layer.2.intermediate.dense.bias True
longformer.encoder.layer.2.output.dense.weight True
longformer.encoder.layer.2.output.dense.bias True
longformer.encoder.layer.2.output.LayerNorm.weight True
longformer.encoder.layer.2.output.LayerNorm.bias True
longformer.encoder.layer.3.attention.self.query.weight True
longformer.encoder.layer.3.attention.self.query.bias True
longformer.encoder.layer.3.attention.self.key.weight True
longformer.encoder.layer.3.attention.self.key.bias True
longformer.encoder.layer.3.attention.self.value.weight True
longformer.encoder.layer.3.attention.self.value.bias True
longformer.encoder.layer.3.attention.self.query_global.weight True
longformer.encoder.layer.3.attention.self.query_global.bias True
longformer.encoder.layer.3.attention.self.key_global.weight True
longformer.encoder.layer.3.attention.self.key_global.bias True
longformer.encoder.layer.3.attention.self.value_global.weight True
longformer.encoder.layer.3.attention.self.value_global.bias True
longformer.encoder.layer.3.attention.output.dense.weight True
longformer.encoder.layer.3.attention.output.dense.bias True
longformer.encoder.layer.3.attention.output.LayerNorm.weight True
longformer.encoder.layer.3.attention.output.LayerNorm.bias True
longformer.encoder.layer.3.intermediate.dense.weight True
longformer.encoder.layer.3.intermediate.dense.bias True
longformer.encoder.layer.3.output.dense.weight True
longformer.encoder.layer.3.output.dense.bias True
longformer.encoder.layer.3.output.LayerNorm.weight True
longformer.encoder.layer.3.output.LayerNorm.bias True
longformer.encoder.layer.4.attention.self.query.weight True
longformer.encoder.layer.4.attention.self.query.bias True
longformer.encoder.layer.4.attention.self.key.weight True
longformer.encoder.layer.4.attention.self.key.bias True
longformer.encoder.layer.4.attention.self.value.weight True
longformer.encoder.layer.4.attention.self.value.bias True
longformer.encoder.layer.4.attention.self.query_global.weight True
longformer.encoder.layer.4.attention.self.query_global.bias True
longformer.encoder.layer.4.attention.self.key_global.weight True
longformer.encoder.layer.4.attention.self.key_global.bias True
longformer.encoder.layer.4.attention.self.value_global.weight True
longformer.encoder.layer.4.attention.self.value_global.bias True
longformer.encoder.layer.4.attention.output.dense.weight True
longformer.encoder.layer.4.attention.output.dense.bias True
longformer.encoder.layer.4.attention.output.LayerNorm.weight True
longformer.encoder.layer.4.attention.output.LayerNorm.bias True
longformer.encoder.layer.4.intermediate.dense.weight True
longformer.encoder.layer.4.intermediate.dense.bias True
longformer.encoder.layer.4.output.dense.weight True
longformer.encoder.layer.4.output.dense.bias True
longformer.encoder.layer.4.output.LayerNorm.weight True
longformer.encoder.layer.4.output.LayerNorm.bias True
longformer.encoder.layer.5.attention.self.query.weight True
longformer.encoder.layer.5.attention.self.query.bias True
longformer.encoder.layer.5.attention.self.key.weight True
longformer.encoder.layer.5.attention.self.key.bias True
longformer.encoder.layer.5.attention.self.value.weight True
longformer.encoder.layer.5.attention.self.value.bias True
longformer.encoder.layer.5.attention.self.query_global.weight True
longformer.encoder.layer.5.attention.self.query_global.bias True
longformer.encoder.layer.5.attention.self.key_global.weight True
longformer.encoder.layer.5.attention.self.key_global.bias True
longformer.encoder.layer.5.attention.self.value_global.weight True
longformer.encoder.layer.5.attention.self.value_global.bias True
longformer.encoder.layer.5.attention.output.dense.weight True
longformer.encoder.layer.5.attention.output.dense.bias True
longformer.encoder.layer.5.attention.output.LayerNorm.weight True
longformer.encoder.layer.5.attention.output.LayerNorm.bias True
longformer.encoder.layer.5.intermediate.dense.weight True
longformer.encoder.layer.5.intermediate.dense.bias True
longformer.encoder.layer.5.output.dense.weight True
longformer.encoder.layer.5.output.dense.bias True
longformer.encoder.layer.5.output.LayerNorm.weight True
longformer.encoder.layer.5.output.LayerNorm.bias True
longformer.encoder.layer.6.attention.self.query.weight True
longformer.encoder.layer.6.attention.self.query.bias True
longformer.encoder.layer.6.attention.self.key.weight True
longformer.encoder.layer.6.attention.self.key.bias True
longformer.encoder.layer.6.attention.self.value.weight True
longformer.encoder.layer.6.attention.self.value.bias True
longformer.encoder.layer.6.attention.self.query_global.weight True
longformer.encoder.layer.6.attention.self.query_global.bias True
longformer.encoder.layer.6.attention.self.key_global.weight True
longformer.encoder.layer.6.attention.self.key_global.bias True
longformer.encoder.layer.6.attention.self.value_global.weight True
longformer.encoder.layer.6.attention.self.value_global.bias True
longformer.encoder.layer.6.attention.output.dense.weight True
longformer.encoder.layer.6.attention.output.dense.bias True
longformer.encoder.layer.6.attention.output.LayerNorm.weight True
longformer.encoder.layer.6.attention.output.LayerNorm.bias True
longformer.encoder.layer.6.intermediate.dense.weight True
longformer.encoder.layer.6.intermediate.dense.bias True
longformer.encoder.layer.6.output.dense.weight True
longformer.encoder.layer.6.output.dense.bias True
longformer.encoder.layer.6.output.LayerNorm.weight True
longformer.encoder.layer.6.output.LayerNorm.bias True
longformer.encoder.layer.7.attention.self.query.weight True
longformer.encoder.layer.7.attention.self.query.bias True
longformer.encoder.layer.7.attention.self.key.weight True
longformer.encoder.layer.7.attention.self.key.bias True
longformer.encoder.layer.7.attention.self.value.weight True
longformer.encoder.layer.7.attention.self.value.bias True
longformer.encoder.layer.7.attention.self.query_global.weight True
longformer.encoder.layer.7.attention.self.query_global.bias True
longformer.encoder.layer.7.attention.self.key_global.weight True
longformer.encoder.layer.7.attention.self.key_global.bias True
longformer.encoder.layer.7.attention.self.value_global.weight True
longformer.encoder.layer.7.attention.self.value_global.bias True
longformer.encoder.layer.7.attention.output.dense.weight True
longformer.encoder.layer.7.attention.output.dense.bias True
longformer.encoder.layer.7.attention.output.LayerNorm.weight True
longformer.encoder.layer.7.attention.output.LayerNorm.bias True
longformer.encoder.layer.7.intermediate.dense.weight True
longformer.encoder.layer.7.intermediate.dense.bias True
longformer.encoder.layer.7.output.dense.weight True
longformer.encoder.layer.7.output.dense.bias True
longformer.encoder.layer.7.output.LayerNorm.weight True
longformer.encoder.layer.7.output.LayerNorm.bias True
longformer.encoder.layer.8.attention.self.query.weight True
longformer.encoder.layer.8.attention.self.query.bias True
longformer.encoder.layer.8.attention.self.key.weight True
longformer.encoder.layer.8.attention.self.key.bias True
longformer.encoder.layer.8.attention.self.value.weight True
longformer.encoder.layer.8.attention.self.value.bias True
longformer.encoder.layer.8.attention.self.query_global.weight True
longformer.encoder.layer.8.attention.self.query_global.bias True
longformer.encoder.layer.8.attention.self.key_global.weight True
longformer.encoder.layer.8.attention.self.key_global.bias True
longformer.encoder.layer.8.attention.self.value_global.weight True
longformer.encoder.layer.8.attention.self.value_global.bias True
longformer.encoder.layer.8.attention.output.dense.weight True
longformer.encoder.layer.8.attention.output.dense.bias True
longformer.encoder.layer.8.attention.output.LayerNorm.weight True
longformer.encoder.layer.8.attention.output.LayerNorm.bias True
longformer.encoder.layer.8.intermediate.dense.weight True
longformer.encoder.layer.8.intermediate.dense.bias True
longformer.encoder.layer.8.output.dense.weight True
longformer.encoder.layer.8.output.dense.bias True
longformer.encoder.layer.8.output.LayerNorm.weight True
longformer.encoder.layer.8.output.LayerNorm.bias True
longformer.encoder.layer.9.attention.self.query.weight True
longformer.encoder.layer.9.attention.self.query.bias True
longformer.encoder.layer.9.attention.self.key.weight True
longformer.encoder.layer.9.attention.self.key.bias True
longformer.encoder.layer.9.attention.self.value.weight True
longformer.encoder.layer.9.attention.self.value.bias True
longformer.encoder.layer.9.attention.self.query_global.weight True
longformer.encoder.layer.9.attention.self.query_global.bias True
longformer.encoder.layer.9.attention.self.key_global.weight True
longformer.encoder.layer.9.attention.self.key_global.bias True
longformer.encoder.layer.9.attention.self.value_global.weight True
longformer.encoder.layer.9.attention.self.value_global.bias True
longformer.encoder.layer.9.attention.output.dense.weight True
longformer.encoder.layer.9.attention.output.dense.bias True
longformer.encoder.layer.9.attention.output.LayerNorm.weight True
longformer.encoder.layer.9.attention.output.LayerNorm.bias True
longformer.encoder.layer.9.intermediate.dense.weight True
longformer.encoder.layer.9.intermediate.dense.bias True
longformer.encoder.layer.9.output.dense.weight True
longformer.encoder.layer.9.output.dense.bias True
longformer.encoder.layer.9.output.LayerNorm.weight True
longformer.encoder.layer.9.output.LayerNorm.bias True
longformer.encoder.layer.10.attention.self.query.weight True
longformer.encoder.layer.10.attention.self.query.bias True
longformer.encoder.layer.10.attention.self.key.weight True
longformer.encoder.layer.10.attention.self.key.bias True
longformer.encoder.layer.10.attention.self.value.weight True
longformer.encoder.layer.10.attention.self.value.bias True
longformer.encoder.layer.10.attention.self.query_global.weight True
longformer.encoder.layer.10.attention.self.query_global.bias True
longformer.encoder.layer.10.attention.self.key_global.weight True
longformer.encoder.layer.10.attention.self.key_global.bias True
longformer.encoder.layer.10.attention.self.value_global.weight True
longformer.encoder.layer.10.attention.self.value_global.bias True
longformer.encoder.layer.10.attention.output.dense.weight True
longformer.encoder.layer.10.attention.output.dense.bias True
longformer.encoder.layer.10.attention.output.LayerNorm.weight True
longformer.encoder.layer.10.attention.output.LayerNorm.bias True
longformer.encoder.layer.10.intermediate.dense.weight True
longformer.encoder.layer.10.intermediate.dense.bias True
longformer.encoder.layer.10.output.dense.weight True
longformer.encoder.layer.10.output.dense.bias True
longformer.encoder.layer.10.output.LayerNorm.weight True
longformer.encoder.layer.10.output.LayerNorm.bias True
longformer.encoder.layer.11.attention.self.query.weight True
longformer.encoder.layer.11.attention.self.query.bias True
longformer.encoder.layer.11.attention.self.key.weight True
longformer.encoder.layer.11.attention.self.key.bias True
longformer.encoder.layer.11.attention.self.value.weight True
longformer.encoder.layer.11.attention.self.value.bias True
longformer.encoder.layer.11.attention.self.query_global.weight True
longformer.encoder.layer.11.attention.self.query_global.bias True
longformer.encoder.layer.11.attention.self.key_global.weight True
longformer.encoder.layer.11.attention.self.key_global.bias True
longformer.encoder.layer.11.attention.self.value_global.weight True
longformer.encoder.layer.11.attention.self.value_global.bias True
longformer.encoder.layer.11.attention.output.dense.weight True
longformer.encoder.layer.11.attention.output.dense.bias True
longformer.encoder.layer.11.attention.output.LayerNorm.weight True
longformer.encoder.layer.11.attention.output.LayerNorm.bias True
longformer.encoder.layer.11.intermediate.dense.weight True
longformer.encoder.layer.11.intermediate.dense.bias True
longformer.encoder.layer.11.output.dense.weight True
longformer.encoder.layer.11.output.dense.bias True
longformer.encoder.layer.11.output.LayerNorm.weight True
longformer.encoder.layer.11.output.LayerNorm.bias True
classifier.dense.weight True
classifier.dense.bias True
classifier.out_proj.weight True
classifier.out_proj.bias True
我的问题
- 为什么所有层 param.requires_grad 都是 True ?至少对于分类器不应该是 False 。层?我们不是在训练他们吗?
- param.requires_grad == True 是否意味着特定层被冻结?我对 requires_grad 的措辞感到困惑。是冷冻的意思吗?
- 如果我想训练一些以前的层,如此处所示,我应该使用下面的代码吗?
for name, param in model.named_parameters():
if name.startswith("..."): # choose whatever you like here
param.requires_grad = False
- 考虑到训练需要很多时间,是否有关于我应该训练的层的具体建议?首先,我计划训练 –
所有以 longformer.encoder.layer.11.
和开头的图层
`classifier.dense.weight`
`classifier.dense.bias`
`classifier.out_proj.weight`
`classifier.out_proj.bias`
- 我是否需要添加任何额外的层,例如 dropout 或者 LongformerForSequenceClassification.from_pretrained 已经处理过?我在上面的输出中没有看到任何丢失层,这就是为什么问这个问题
#——————–更新1
通过使用@joe32140 给出的答案中的以下代码,我怎么知道哪些层被冻结了?我的猜测是除了原始问题中显示的输出中的最后 4 层之外的所有内容都被冻结了。但是有没有更简单的检查方法?
for param in model.base_model.parameters():
param.requires_grad = False
回复
我来回复-
joe32140 评论
该回答已被采纳!
- requires_grad==True means that we will compute the gradient of this tensor, so the default setting is we will train/finetune all layers.
- 您只能通过冻结编码器来训练输出层
for param in model.base_model.parameters(): param.requires_grad = False
- 是的,dropout 用于拥抱脸输出层的实现。见这里:https://github.com/huggingface/transformers/blob/198c335d219a5eb4d3f124fdd1ce1a9cd9f78a9b/src/transformers/models/longformer/modeling_longformer.py#L1938
- 至于更新1,是的,base_model是指不包括输出分类头的层。然而,它实际上是两层而不是四层,其中每一层都有一个权重和一个偏置张量。
2年前