计算pyspark中表格的计数百分比
python 276
原文标题 :Calculating percentage of counts of a table in pyspark
我想计算表的阈值限制计数。例如,我的表数为 100,如下所示 –
spark.sql("""select count(*) from dev.my_table_metrics""").show(10,False)
+--------+
|count(1)|
+--------+
|100 |
+--------+
我想得出如下结果,即lower_limit 是计数的-5%,上限是计数的+5% –
+--------+------------+-----------+
|count(1)|upper_limit |lower_limit|
+--------+------------+-----------+
|100 |105 |95 |
+--------+------------+-----------+
我尝试使用 percentile(100,5) 函数,但遇到如下错误。"cannot resolve 'percentile(100, CAST(5 AS DOUBLE), 1L)' due to data type mismatch
有人可以帮我解决这个问题。
提前致谢
回复
我来回复-
bunnylorr 评论
spark.sql("""select count(*), count(*)-(count(*)*(5/100)) as lower_limit , count(*)+(count(*)*(5/100)) as upper_limit from dev.my_table_metrics""").show(10,False) +---------+-----------+-----------+ |count(1) |lower_limit|upper_limit| +---------+-----------+-----------+ |100 |95 |105 | +---------+-----------+-----------+
2年前