在 Pandas 中查找具有特定标签的行的比例的最简单方法是什么？

社会演员多 2年前 python 436

原文标题 ：What is the easiest way in Pandas to find what proportion of rows have a particular label?

在给出某个人口的人口统计数据的表格中，我想找出德国公民的比例。我想知道 Pandas 中是否有一个功能可以找出有多少行有某个标签，或者在这种情况下，有多少行在“本国”列中有“德国”。

数据来自mlcourse.ai：https://raw.githubusercontent.com/Yorko/mlcourse.ai/master/data/

我尝试使用“value counts”函数来查看是否可以打印出“native-country”列下具有“:Germany”的行数。将 normalize 设置为 true，我只需将结果乘以 100得到以德国为祖国的人的比例。

data[data["native-country"]=="Germany"].value_counts(normalize=True)

输出：

age  workclass         fnlwgt  education   education-num  marital-status      occupation       relationship   race                sex     capital-gain  capital-loss  hours-per-week  native-country  salary
18   ?                 85154   12th        8              Never-married       ?                Own-child      Asian-Pac-Islander  Female  0             0             24              Germany         <=50K     0.007299
46   Private           35961   Assoc-acdm  12             Divorced            Sales            Not-in-family  White               Female  0             0             25              Germany         <=50K     0.007299
45   Private           161954  Bachelors   13             Never-married       Prof-specialty   Not-in-family  White               Female  0             0             40              Germany         <=50K     0.007299
                       174794  Bachelors   13             Separated           Prof-specialty   Unmarried      White               Female  0             0             56              Germany         <=50K     0.007299
                       204057  Bachelors   13             Divorced            Adm-clerical     Unmarried      White               Female  0             0             40              Germany         <=50K     0.007299
                                                                                                                                                                                                                  ...   
30   Private           318749  Assoc-voc   11             Married-civ-spouse  Tech-support     Wife           White               Female  0             0             35              Germany         <=50K     0.007299
                       116508  HS-grad     9              Married-civ-spouse  Craft-repair     Husband        White               Male    0             0             40              Germany         <=50K     0.007299
                       111415  HS-grad     9              Married-civ-spouse  Other-service    Husband        White               Male    0             0             55              Germany         <=50K     0.007299
                       77143   Bachelors   13             Never-married       Exec-managerial  Own-child      Black               Male    0             0             40              Germany         <=50K     0.007299
74   Self-emp-not-inc  199136  Bachelors   13             Widowed             Craft-repair     Not-in-family  White               Male    15831         0             8               Germany         >50K      0.007299
Length: 137, dtype: float64

这似乎不起作用，并返回了由以德国为本国的人组成的表格部分。我可以用它来得到我的答案，但我正在寻找一种更简单的方法，如果有的话。

原文链接：https://stackoverflow.com//questions/71686103/what-is-the-easiest-way-in-pandas-to-find-what-proportion-of-rows-have-a-particu

我来回复

Rayan Hatout 评论

发生的事情是，您当前正在过滤具有["native-country"]=="Germany"的行，然后在整个结果 DataFrame 上运行值计数。这将为您提供 1 的计数，因为当考虑所有属性时，每一行都是唯一的。

你应该做的是隔离native-country列，然后根据它进行计数。

代码如下所示：

native_countries = data["native-country"]
native_countries_count = native_countries.value_counts(normalize=True)
print(native_countries_count["Germany"])

我创建了以下玩具数据集来演示：

df = pd.DataFrame({
"age": [12, 23, 34, 45],
"native-country": ["Germany", "Germany", "Germany", "France"]})

print(df)
#    age native-country
# 0   12        Germany
# 1   23        Germany
# 2   34        Germany
# 3   45         France

native_countries = df["native-country"]

print(native_countries)
# 0    Germany
# 1    Germany
# 2    Germany
# 3     France
# Name: native-country, dtype: object

native_countries_count = native_countries.value_counts(normalize=True)
print(native_countries_count)
# Germany    0.75
# France     0.25
# Name: native-country, dtype: float64

print(native_countries_count["Germany"])
# 0.75

2年前 0条评论

constantstranger 评论

尝试这个：

import pandas as pd
data = pd.DataFrame(
    {'native-country':['France', 'Australia', 'South Africa', 'Germany', 'France', 'Australia', 'South Africa', 'Germany', 'France', 'Australia', 'South Africa', 'Germany', 'France', 'Australia', 'South Africa', 'Germany'],
    'age':[21,22,23,24,25,26,27,28,29,30,29,28,27,26,25,24]})

print(data[data["native-country"]=="Germany"].shape[0] / data.shape[0])

输出：

0.25

2年前 0条评论

在 Pandas 中查找具有特定标签的行的比例的最简单方法是什么？

回复

相关问题