学习笔记 NLP里的RNN和LSTM

1. Recurrent neural network

Recurrent_neural_network_unfold.svg

1.1 Elman network

学习笔记 NLP里的RNN和LSTM

学习笔记 NLP里的RNN和LSTM

1.2 Jordan network

学习笔记 NLP里的RNN和LSTM

学习笔记 NLP里的RNN和LSTM

Variables and functions

  • 学习笔记 NLP里的RNN和LSTM : input vector
  • 学习笔记 NLP里的RNN和LSTM : hidden layer vector
  • 学习笔记 NLP里的RNN和LSTM : output vector
  • 学习笔记 NLP里的RNN和LSTM and 学习笔记 NLP里的RNN和LSTM : parameter matrices and vector
  • 学习笔记 NLP里的RNN和LSTM and 学习笔记 NLP里的RNN和LSTM : Activation functions

1.3 Bidirectional RNN

2. Long short-term memory

2.1 LSTM with a forget gate

The compact forms of the equations for the forward pass of an LSTM cell with a forget gate are:
学习笔记 NLP里的RNN和LSTM
where the initial values are 学习笔记 NLP里的RNN和LSTM and 学习笔记 NLP里的RNN和LSTM and the operator o denotes the Hadamard product (element-wise product). The subscript 学习笔记 NLP里的RNN和LSTM indexes the time step.
Variables

  • 学习笔记 NLP里的RNN和LSTM : input vector to the LSTM unit
  • 学习笔记 NLP里的RNN和LSTM : forget gate’s activation vector
  • 学习笔记 NLP里的RNN和LSTM input/update gate’s activation vector
  • 学习笔记 NLP里的RNN和LSTM : output gate’s activation vector
  • 学习笔记 NLP里的RNN和LSTM : hidden state vector also known as output vector of the LSTM unit
  • 学习笔记 NLP里的RNN和LSTM cell input activation vector
  • 学习笔记 NLP里的RNN和LSTM : cell state vector
  • 学习笔记 NLP里的RNN和LSTM and 学习笔记 NLP里的RNN和LSTM : weight matrices and bias vector parameters which need to be learned during training where the superscripts 学习笔记 NLP里的RNN和LSTM and 学习笔记 NLP里的RNN和LSTM refer to the number of input features and number of hidden units, respectively.

2.2 Peephole LSTM

学习笔记 NLP里的RNN和LSTM

3. training RNN

3.1 Problem

RNN: The error surface is either very flat or very steep → 梯度消失/爆炸 Gradient Vanishing/Exploding

绘图1

3.2 Techniques

  • Clipping the gradients
  • Advanced optimization technology
    • NAG
    • RMSprop
  • Try LSTM (or other simpler variants)
    • Can deal with gradient vanishing (not gradient explode)
    • Memory and input are added (在RNN中,对于每一个输入,memory会重置)
    • The influence never disappears unless forget gate is closed (No Gradient vanishing, if forget gate is opened.)
  • Better initialization
    • Vanilla RNN Initialized with Identity matrix + ReLU activation function [Quoc V. Le, arXiv’15]

参考资料

[1] Recurrent neural network – Wikipedia

[2] Long short-term memory – Wikipedia

[3] Bidirectional Recurrent Neural Networks – Dive into Deep …

[4] 机器学习 李宏毅

文章出处登录后可见!

已经登录?立即刷新

共计人评分,平均

到目前为止还没有投票!成为第一位评论此文章。

(0)
xiaoxingxing的头像xiaoxingxing管理团队
上一篇 2022年5月23日
下一篇 2022年5月23日

相关推荐