TensorFlow LSTMBlockFusedCell 和 PyTorch LSTM 的实现区别

扎眼的阳光 pytorch 312

原文标题Implementation difference between TensorFlow LSTMBlockFusedCell and PyTorch LSTM

我正在尝试将 tensorflow LSTMBlockFusedCell model 转换为 pytorch LSTM,但是在两个模型中我没有得到具有相同输入和权重的相同输出。我相信这是由于如何为torch模型设置权重;在 TensorFlow 下方的代码片段中,权重的形状为(400, 164),而 PyTorch 的权重分别为(400,64)(400,100)torch_lstm.weight_ih_l0torch_lstm.weight_hh_l0。我通过使用第一个64元素作为weight_ih_l0和前面的100元素作为weight_hh_l0来解决这种不一致问题。根据这篇文章,TensorFlow 使用右乘法而不是 PyTorch 左乘法,这就是我需要转置权重的原因。此外,我将偏差设置为 0(使其无用)以进行调试。

import tensorflow as tf
import numpy as np
import torch

time_len, batch_size, input_size, num_units = 50, 1, 64, 100 # L, N, Hin, Hout with torch semantics

# setup tensorflow LSTM
tf_lstm = tf.contrib.rnn.LSTMBlockFusedCell(num_units=num_units)
inp = tf.placeholder(tf.float32, shape=(time_len, batch_size, input_size))
out, c = tf_lstm(inp, dtype=tf.float32)
tf_weight = tf_lstm.weights[0]
tf_bias = tf_lstm.weights[1]

# initialize weights
sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)

# tf forward pass
a = np.random.randn(time_len, batch_size, input_size).astype(np.float32) # input
b = np.zeros(tf_bias.shape) # set lstm bias to zero
tf_out, lstm_weight, lstm_bias = sess.run([out, tf_weight, tf_bias], {inp: a, tf_bias: b})
assert (lstm_bias == 0).all() # make sure lstm bias was 0

# setup pytorch LSTM
torch_lstm = torch.nn.LSTM(input_size=input_size, hidden_size=num_units, num_layers=1, bias=False)

# set torch weights same as tensorflow weights (is this correct?)
w1 = lstm_weight[:input_size, :] # first 64 elements
w2 = lstm_weight[input_size:, :] # proceeding 100 elements
torch_lstm.weight_ih_l0.data = torch.tensor(w1.T) # transpose and set first weight
torch_lstm.weight_hh_l0.data = torch.tensor(w2.T) # transpose and set second weight

# torch forward pass
torch_out, (hn, cn) = torch_lstm(torch.tensor(a))
torch_out = torch_out.detach().numpy() # convert to numpy for compatibility

# compare
assert torch_out.shape == tf_out.shape
print("np.allclose(torch_out, tf_out) = ", np.allclose(torch_out, tf_out))
print("normalized difference: ", np.linalg.norm(torch_out - tf_out))

输出:

np.allclose(torch_out, tf_out) = False
normalized difference: 10.741002

预期输出:

np.allclose(torch_out, tf_out) = True
normalized difference: ~0.0

我在具有以下依赖项的 cpu 上运行:

numpy==1.21.5
tensorflow-gpu==1.14.0
torch==1.11.0

我正在运行 tensorflow v1,cpu 版本应该可以工作,python 轮在此处可用于 python<=3.7。

任何帮助表示赞赏。

原文链接:https://stackoverflow.com//questions/71540771/implementation-difference-between-tensorflow-lstmblockfusedcell-and-pytorch-lstm

回复

我来回复
  • Kevin的头像
    Kevin 评论

    我相信我通过更改与每个门相关的权重顺序和设置forget_bias=0.0inLSTMBlockFusedCell解决了这个问题:

    import tensorflow as tf
    import numpy as np
    import torch
    import itertools as it
    
    time_len, batch_size, input_size, num_units = 50, 1, 64, 100 # L, N, Hin, Hout with torch semantics
    
    # setup tensorflow LSTM
    tf_lstm = tf.contrib.rnn.LSTMBlockFusedCell(num_units=num_units, forget_bias=0.0, dtype=tf.float32)
    inp = tf.placeholder(tf.float32, shape=(time_len, batch_size, input_size))
    out, c = tf_lstm(inp, dtype=tf.float32)
    tf_weight = tf_lstm.weights[0]
    tf_bias = tf_lstm.weights[1]
    
    # initialize weights
    sess = tf.Session()
    init = tf.global_variables_initializer()
    sess.run(init)
    
    # tf forward pass
    a = np.random.randn(*inp.shape).astype(np.float32) # input
    b = np.zeros(tf_bias.shape) # set lstm bias to zero
    tf_out, lstm_weight, lstm_bias = sess.run([out, tf_weight, tf_bias], {inp: a, tf_bias: b})
    assert (lstm_bias == 0).all() # make sure lstm bias was 0
    
    # setup pytorch LSTM
    torch_lstm = torch.nn.LSTM(input_size=input_size, hidden_size=num_units, num_layers=1, bias=False)
    
    # weights associated with each gate
    i = lstm_weight[:, 0:100].copy(), 'i'
    f = lstm_weight[:, 100:200].copy(), 'f'
    o = lstm_weight[:, 200:300].copy(), 'o'
    g = lstm_weight[:, 300:400].copy(), 'g'
    
    for i,f,o,g in it.permutations([i,f,o,g], 4):
        print(*[x[1] for x in (i,f,o,g)])
        i,f,o,g = (x[0] for x in (i,f,o,g))
        lstm_weight = np.concatenate([i,f,o,g], axis=1)
    
        # set torch weights same as tensorflow weights
        w1 = lstm_weight[:input_size, :] # first 64 elements
    
        w2 = lstm_weight[input_size:, :] # proceeding 100 elements
        torch_lstm.weight_ih_l0.data = torch.tensor(w1.T) # transpose and set first weight
        torch_lstm.weight_hh_l0.data = torch.tensor(w2.T) # transpose and set second weight
    
        # torch forward pass
        torch_out, (hn, cn) = torch_lstm(torch.tensor(a))
        torch_out = torch_out.detach().numpy() # convert to numpy for compatibility
    
        # compare
        assert torch_out.shape == tf_out.shape
        print("np.allclose(torch_out, tf_out) = ", np.allclose(torch_out, tf_out))
        print("normalized difference: ", np.linalg.norm(torch_out - tf_out))
    

    这将打印所有门权重排列的差异,组合i o f g给出了足够接近的差异1.7814435e-06

    2年前 0条评论