I've finally come to the same conclusion, but isn't there any way to call
this Hive UDAFs from the agg("percentile(key,0.5)") ??

Le mar. 2 juin 2015 à 15:37, Yana Kadiyska <yana.kadiy...@gmail.com> a
écrit :

> Like this...sqlContext should be a HiveContext instance
> case class KeyValue(key: Int, value: String)
> val df=sc.parallelize(1 to 50).map(i=>KeyValue(i, i.toString)).toDF
> df.registerTempTable("table")
> sqlContext.sql("select percentile(key,0.5) from table").show()
> ​
> On Tue, Jun 2, 2015 at 8:07 AM, Olivier Girardot <
> o.girar...@lateral-thoughts.com> wrote:
>> Hi everyone,
>> Is there any way to compute a median on a column using Spark's Dataframe.
>> I know you can use stats in a RDD but I'd rather stay within a dataframe.
>> Hive seems to imply that using ntile one can compute percentiles,
>> quartiles and therefore a median.
>> Does anyone have experience with this ?
>> Regards,
>> Olivier.

Reply via email to