I've finally come to the same conclusion, but isn't there any way to call this Hive UDAFs from the agg("percentile(key,0.5)") ??
Le mar. 2 juin 2015 à 15:37, Yana Kadiyska <yana.kadiy...@gmail.com> a écrit : > Like this...sqlContext should be a HiveContext instance > > case class KeyValue(key: Int, value: String) > val df=sc.parallelize(1 to 50).map(i=>KeyValue(i, i.toString)).toDF > df.registerTempTable("table") > sqlContext.sql("select percentile(key,0.5) from table").show() > > > > On Tue, Jun 2, 2015 at 8:07 AM, Olivier Girardot < > o.girar...@lateral-thoughts.com> wrote: > >> Hi everyone, >> Is there any way to compute a median on a column using Spark's Dataframe. >> I know you can use stats in a RDD but I'd rather stay within a dataframe. >> Hive seems to imply that using ntile one can compute percentiles, >> quartiles and therefore a median. >> Does anyone have experience with this ? >> >> Regards, >> >> Olivier. >> > >