[GitHub] [spark] hagerf commented on issue #26762: [SPARK-30131] add array_median function

2019-12-05 Thread GitBox
hagerf commented on issue #26762: [SPARK-30131] add array_median function URL: https://github.com/apache/spark/pull/26762#issuecomment-562107475 @srowen Ok, I see. If it's really that restrictive then users can use other functions for this, even though I think it could be a popular additio

[GitHub] [spark] hagerf commented on issue #26762: [SPARK-30131] add array_median function

2019-12-05 Thread GitBox
hagerf commented on issue #26762: [SPARK-30131] add array_median function URL: https://github.com/apache/spark/pull/26762#issuecomment-562097725 Yes, of course. But we have the prefix `approx` because calculating exact median over a whole dataset is difficult to do efficiently. So users wh

[GitHub] [spark] hagerf commented on issue #26762: [SPARK-30131] add array_median function

2019-12-05 Thread GitBox
hagerf commented on issue #26762: [SPARK-30131] add array_median function URL: https://github.com/apache/spark/pull/26762#issuecomment-562059403 @HyukjinKwon I added some links, I think they should be relevant. We already have `approxQuantile` but then this would be an exact function,

[GitHub] [spark] hagerf commented on issue #26762: [SPARK-30131] add array_median function

2019-12-04 Thread GitBox
hagerf commented on issue #26762: [SPARK-30131] add array_median function URL: https://github.com/apache/spark/pull/26762#issuecomment-561898965 @srowen From a quick googling, I see it in AWS Redshift and in IBM DB2 as aggregate functions. I've seen several tickets in Spark requesting medi