在 Pandas 中查找具有特定标签的行的比例的最简单方法是什么?
python 581
原文标题 :What is the easiest way in Pandas to find what proportion of rows have a particular label?
在给出某个人口的人口统计数据的表格中,我想找出德国公民的比例。我想知道 Pandas 中是否有一个功能可以找出有多少行有某个标签,或者在这种情况下,有多少行在“本国”列中有“德国”。
数据来自mlcourse.ai:https://raw.githubusercontent.com/Yorko/mlcourse.ai/master/data/
我尝试使用“value counts”函数来查看是否可以打印出“native-country”列下具有“:Germany”的行数。将 normalize 设置为 true,我只需将结果乘以 100得到以德国为祖国的人的比例。
data[data["native-country"]=="Germany"].value_counts(normalize=True)
输出:
age workclass fnlwgt education education-num marital-status occupation relationship race sex capital-gain capital-loss hours-per-week native-country salary
18 ? 85154 12th 8 Never-married ? Own-child Asian-Pac-Islander Female 0 0 24 Germany <=50K 0.007299
46 Private 35961 Assoc-acdm 12 Divorced Sales Not-in-family White Female 0 0 25 Germany <=50K 0.007299
45 Private 161954 Bachelors 13 Never-married Prof-specialty Not-in-family White Female 0 0 40 Germany <=50K 0.007299
174794 Bachelors 13 Separated Prof-specialty Unmarried White Female 0 0 56 Germany <=50K 0.007299
204057 Bachelors 13 Divorced Adm-clerical Unmarried White Female 0 0 40 Germany <=50K 0.007299
...
30 Private 318749 Assoc-voc 11 Married-civ-spouse Tech-support Wife White Female 0 0 35 Germany <=50K 0.007299
116508 HS-grad 9 Married-civ-spouse Craft-repair Husband White Male 0 0 40 Germany <=50K 0.007299
111415 HS-grad 9 Married-civ-spouse Other-service Husband White Male 0 0 55 Germany <=50K 0.007299
77143 Bachelors 13 Never-married Exec-managerial Own-child Black Male 0 0 40 Germany <=50K 0.007299
74 Self-emp-not-inc 199136 Bachelors 13 Widowed Craft-repair Not-in-family White Male 15831 0 8 Germany >50K 0.007299
Length: 137, dtype: float64
这似乎不起作用,并返回了由以德国为本国的人组成的表格部分。我可以用它来得到我的答案,但我正在寻找一种更简单的方法,如果有的话。
回复
我来回复-
Rayan Hatout 评论
发生的事情是,您当前正在过滤具有
["native-country"]=="Germany"
的行,然后在整个结果 DataFrame 上运行值计数。这将为您提供 1 的计数,因为当考虑所有属性时,每一行都是唯一的。你应该做的是隔离
native-country
列,然后根据它进行计数。代码如下所示:
native_countries = data["native-country"] native_countries_count = native_countries.value_counts(normalize=True) print(native_countries_count["Germany"])
我创建了以下玩具数据集来演示:
df = pd.DataFrame({ "age": [12, 23, 34, 45], "native-country": ["Germany", "Germany", "Germany", "France"]}) print(df) # age native-country # 0 12 Germany # 1 23 Germany # 2 34 Germany # 3 45 France native_countries = df["native-country"] print(native_countries) # 0 Germany # 1 Germany # 2 Germany # 3 France # Name: native-country, dtype: object native_countries_count = native_countries.value_counts(normalize=True) print(native_countries_count) # Germany 0.75 # France 0.25 # Name: native-country, dtype: float64 print(native_countries_count["Germany"]) # 0.75
2年前 -
constantstranger 评论
尝试这个:
import pandas as pd data = pd.DataFrame( {'native-country':['France', 'Australia', 'South Africa', 'Germany', 'France', 'Australia', 'South Africa', 'Germany', 'France', 'Australia', 'South Africa', 'Germany', 'France', 'Australia', 'South Africa', 'Germany'], 'age':[21,22,23,24,25,26,27,28,29,30,29,28,27,26,25,24]}) print(data[data["native-country"]=="Germany"].shape[0] / data.shape[0])
输出:
0.25
2年前