如何使用硬编码值在 tensorflow 中进行一种热编码

xiaoxingxing 2年前 tensorflow 237

原文标题 ：How to do one hot encoding in tensorflow using hardcoded values

我想对我的分类特征应用一种热编码。我知道如何使用tf.one_hot来做到这一点，但one_hot接受索引，所以我需要将标记映射到索引。但是我发现的所有示例都是在整个数据集上计算词汇。我不想这样做，因为我有可能值的硬编码字典。就像是：

CATEG = {
    'feature1': ['a', 'b', 'c'],
    'feature2': ['foo', 'bar']
}

我只需要proprocessing_fn将标记简单地映射到索引然后运行它tf.one_hot。我怎样才能做到这一点？

例如，tft.apply_vocabulary听起来像我需要的，但我发现它需要deferred_vocab_filename_tensor类型common_types.TemporaryAnalyzerOutputType？描述说：

tft.vocabulary 返回的延迟词汇文件名张量，只要不存储频率。

我看到tft.vocabulary又在计算词汇：

计算 x 采用的唯一值，可以是任意大小的张量或复合张量。唯一值将在 x 和所有实例的所有维度上聚合。

为什么不存在这样简单的东西？

原文链接：https://stackoverflow.com//questions/71505664/how-to-do-one-hot-encoding-in-tft-using-hardcoded-values

我来回复

Alexey Tochin 评论

最简单的选择大概就是使用tf.equal如下

import tensorflow as tf
CATEG = {
    'feature1': ['a', 'b', 'c'],
    'feature2': ['foo', 'bar']
}
tokens = tf.constant(CATEG['feature2'])
inputs = tf.constant(["foo", "foo", "bar", "none"])
onehot = tf.cast(tf.expand_dims(tokens, 1) == tf.expand_dims(inputs, 0), dtype=tf.float32)
print(onehot)
# [[1., 1., 0., 0.],
#  [0., 0., 1., 0.]]

如果需要，添加批量昏暗。

2年前 0条评论

如何使用硬编码值在 tensorflow 中进行一种热编码

回复

相关问题