sklearn.preprocessing.OneHotEncoder and the way to read it

青葱年少 2年前 python 185

原文标题 ：sklearn.preprocessing.OneHotEncoder and the way to read it

在我拥有的所有预处理数据管道中，我一直在使用 one-hot 编码。

但是我现在遇到了一个问题，我正在尝试使用运行模型的烧瓶服务器自动预处理新数据。

我正在尝试做的 TLDR 是搜索特定日期、区域和类型的新数据并在其上运行 .predict。

问题出现在我搜索特定数据点后，我必须将列从对象更改为单热编码的列。

我的问题是，我怎么知道哪一列是针对某个功能中的哪个类别的？因为在一次热编码后我有大约 240 列。

原文链接：https://stackoverflow.com//questions/71555321/sklearn-preprocessing-onehotencoder-and-the-way-to-read-it

我来回复

Corralien 评论

IIUC，用get_feature_names_out()：

import pandas as pd
from sklearn.preprocessing import OneHotEncoder

df = pd.DataFrame({'A': [0, 1, 2], 'B': [3, 1, 0],
                   'C': [0, 2, 2], 'D': [0, 1, 1]})

ohe = OneHotEncoder()
data = ohe.fit_transform(df)
df1 = pd.DataFrame(data.toarray(), columns=ohe.get_feature_names_out(), dtype=int)

输出：

>>> df
   A  B  C  D
0  0  3  0  0
1  1  1  2  1
2  2  0  2  1


>>> df1
   A_0  A_1  A_2  B_0  B_1  B_3  C_0  C_2  D_0  D_1
0    1    0    0    0    0    1    1    0    1    0
1    0    1    0    0    1    0    0    1    0    1
2    0    0    1    1    0    0    0    1    0    1

>>> pd.Series(ohe.get_feature_names_out()).str.rsplit('_', 1).str[0]
0    A
1    A
2    A
3    B
4    B
5    B
6    C
7    C
8    D
9    D
dtype: object

2年前 0条评论

sklearn.preprocessing.OneHotEncoder and the way to read it

回复

相关问题