Nice to hear from you Holden ! I ended up trying exactly that (Column) - but I may have done it wrong :
In [*5*]: g.agg(Column("percentile(value, 0.5)")) Py4JError: An error occurred while calling o97.agg. Trace: py4j.Py4JException: Method agg([class java.lang.String, class scala.collection.immutable.Nil$]) does not exist at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:333) Any idea ? Olivier. Le mar. 2 juin 2015 à 18:02, Holden Karau <hol...@pigscanfly.ca> a écrit : > Not super easily, the GroupedData class uses a strToExpr function which > has a pretty limited set of functions so we cant pass in the name of an > arbitrary hive UDAF (unless I'm missing something). We can instead > construct an column with the expression you want and then pass it in to > agg() that way (although then you need to call the hive UDAF there). There > are some private classes in hiveUdfs.scala which expose hiveUdaf's as Spark > SQL AggregateExpressions, but they are private. > > On Tue, Jun 2, 2015 at 8:28 AM, Olivier Girardot < > o.girar...@lateral-thoughts.com> wrote: > >> I've finally come to the same conclusion, but isn't there any way to call >> this Hive UDAFs from the agg("percentile(key,0.5)") ?? >> >> Le mar. 2 juin 2015 à 15:37, Yana Kadiyska <yana.kadiy...@gmail.com> a >> écrit : >> >>> Like this...sqlContext should be a HiveContext instance >>> >>> case class KeyValue(key: Int, value: String) >>> val df=sc.parallelize(1 to 50).map(i=>KeyValue(i, i.toString)).toDF >>> df.registerTempTable("table") >>> sqlContext.sql("select percentile(key,0.5) from table").show() >>> >>> >>> >>> On Tue, Jun 2, 2015 at 8:07 AM, Olivier Girardot < >>> o.girar...@lateral-thoughts.com> wrote: >>> >>>> Hi everyone, >>>> Is there any way to compute a median on a column using Spark's >>>> Dataframe. I know you can use stats in a RDD but I'd rather stay within a >>>> dataframe. >>>> Hive seems to imply that using ntile one can compute percentiles, >>>> quartiles and therefore a median. >>>> Does anyone have experience with this ? >>>> >>>> Regards, >>>> >>>> Olivier. >>>> >>> >>> > > > -- > Cell : 425-233-8271 > Twitter: https://twitter.com/holdenkarau > Linked In: https://www.linkedin.com/in/holdenkarau >