Nice to hear from you Holden ! I ended up trying exactly that (Column) -
but I may have done it wrong :

In [*5*]: g.agg(Column("percentile(value, 0.5)"))
Py4JError: An error occurred while calling o97.agg. Trace:
py4j.Py4JException: Method agg([class java.lang.String, class
scala.collection.immutable.Nil$]) does not exist
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:333)

Any idea ?

Olivier.
Le mar. 2 juin 2015 à 18:02, Holden Karau <hol...@pigscanfly.ca> a écrit :

> Not super easily, the GroupedData class uses a strToExpr function which
> has a pretty limited set of functions so we cant pass in the name of an
> arbitrary hive UDAF (unless I'm missing something). We can instead
> construct an column with the expression you want and then pass it in to
> agg() that way (although then you need to call the hive UDAF there). There
> are some private classes in hiveUdfs.scala which expose hiveUdaf's as Spark
> SQL AggregateExpressions, but they are private.
>
> On Tue, Jun 2, 2015 at 8:28 AM, Olivier Girardot <
> o.girar...@lateral-thoughts.com> wrote:
>
>> I've finally come to the same conclusion, but isn't there any way to call
>> this Hive UDAFs from the agg("percentile(key,0.5)") ??
>>
>> Le mar. 2 juin 2015 à 15:37, Yana Kadiyska <yana.kadiy...@gmail.com> a
>> écrit :
>>
>>> Like this...sqlContext should be a HiveContext instance
>>>
>>> case class KeyValue(key: Int, value: String)
>>> val df=sc.parallelize(1 to 50).map(i=>KeyValue(i, i.toString)).toDF
>>> df.registerTempTable("table")
>>> sqlContext.sql("select percentile(key,0.5) from table").show()
>>>
>>> ​
>>>
>>> On Tue, Jun 2, 2015 at 8:07 AM, Olivier Girardot <
>>> o.girar...@lateral-thoughts.com> wrote:
>>>
>>>> Hi everyone,
>>>> Is there any way to compute a median on a column using Spark's
>>>> Dataframe. I know you can use stats in a RDD but I'd rather stay within a
>>>> dataframe.
>>>> Hive seems to imply that using ntile one can compute percentiles,
>>>> quartiles and therefore a median.
>>>> Does anyone have experience with this ?
>>>>
>>>> Regards,
>>>>
>>>> Olivier.
>>>>
>>>
>>>
>
>
> --
> Cell : 425-233-8271
> Twitter: https://twitter.com/holdenkarau
> Linked In: https://www.linkedin.com/in/holdenkarau
>

Reply via email to