Re: Compute Median in Spark Dataframe

Olivier Girardot Tue, 02 Jun 2015 08:29:33 -0700

I've finally come to the same conclusion, but isn't there any way to call
this Hive UDAFs from the agg("percentile(key,0.5)") ??


Le mar. 2 juin 2015 à 15:37, Yana Kadiyska <yana.kadiy...@gmail.com> a
écrit :

> Like this...sqlContext should be a HiveContext instance
>
> case class KeyValue(key: Int, value: String)
> val df=sc.parallelize(1 to 50).map(i=>KeyValue(i, i.toString)).toDF
> df.registerTempTable("table")
> sqlContext.sql("select percentile(key,0.5) from table").show()
>
> 
>
> On Tue, Jun 2, 2015 at 8:07 AM, Olivier Girardot <
> o.girar...@lateral-thoughts.com> wrote:
>
>> Hi everyone,
>> Is there any way to compute a median on a column using Spark's Dataframe.
>> I know you can use stats in a RDD but I'd rather stay within a dataframe.
>> Hive seems to imply that using ntile one can compute percentiles,
>> quartiles and therefore a median.
>> Does anyone have experience with this ?
>>
>> Regards,
>>
>> Olivier.
>>
>
>

Re: Compute Median in Spark Dataframe

Reply via email to