计算pyspark中表格的计数百分比

xiaoxingxing python 234

原文标题Calculating percentage of counts of a table in pyspark

我想计算表的阈值限制计数。例如,我的表数为 100,如下所示 –

spark.sql("""select count(*) from dev.my_table_metrics""").show(10,False)
+--------+                                                                      
|count(1)|
+--------+
|100     |
+--------+

我想得出如下结果,即lower_limit 是计数的-5%,上限是计数的+5% –

+--------+------------+-----------+                                                                      
|count(1)|upper_limit |lower_limit|
+--------+------------+-----------+
|100     |105         |95         |
+--------+------------+-----------+

我尝试使用 percentile(100,5) 函数,但遇到如下错误。"cannot resolve 'percentile(100, CAST(5 AS DOUBLE), 1L)' due to data type mismatch

有人可以帮我解决这个问题。

提前致谢

原文链接:https://stackoverflow.com//questions/71463592/calculating-percentage-of-counts-of-a-table-in-pyspark

回复

我来回复
  • bunnylorr的头像
    bunnylorr 评论
    spark.sql("""select count(*), count(*)-(count(*)*(5/100)) as lower_limit , count(*)+(count(*)*(5/100)) as upper_limit  from dev.my_table_metrics""").show(10,False)
    +---------+-----------+-----------+                                              
    |count(1) |lower_limit|upper_limit|
    +---------+-----------+-----------+
    |100      |95         |105        |
    +---------+-----------+-----------+
    
    2年前 0条评论