Many thanks, will look into this. I dont want to particularly reuse the custom Hive UDAF I have, would prefer writing a new one it that is cleaner. I am just using the JVM.
On 5 June 2015 at 00:03, Holden Karau <hol...@pigscanfly.ca> wrote: > My current example doesn't use a Hive UDAF, but you would do something > pretty similar (it calls a new user defined UDAF, and there are wrappers to > make Spark SQL UDAFs from Hive UDAFs but they are private). So this is > doable, but since it pokes at internals it will likely break between > versions of Spark. If you want to see the WIP PR I have with Sparkling > Pandas its at > https://github.com/sparklingpandas/sparklingpandas/pull/90/files . If > your doing this in JVM and just want to know how to wrap the Hive UDAF, you > can grep/look in sql/hive/ in Spark, but I'd encourage you to see if there > is another way to accomplish what you want (since poking at the internals > is kind of dangerous). > > On Thu, Jun 4, 2015 at 6:28 AM, Deenar Toraskar <deenar.toras...@gmail.com > > wrote: > >> Hi Holden, Olivier >> >> >> >>So for column you need to pass in a Java function, I have some sample >> code which does this but it does terrible things to access Spark internals. >> I also need to call a Hive UDAF in a dataframe agg function. Are there >> any examples of what Column expects? >> >> Deenar >> >> On 2 June 2015 at 21:13, Holden Karau <hol...@pigscanfly.ca> wrote: >> >>> So for column you need to pass in a Java function, I have some sample >>> code which does this but it does terrible things to access Spark internals. >>> >>> >>> On Tuesday, June 2, 2015, Olivier Girardot < >>> o.girar...@lateral-thoughts.com> wrote: >>> >>>> Nice to hear from you Holden ! I ended up trying exactly that (Column) >>>> - but I may have done it wrong : >>>> >>>> In [*5*]: g.agg(Column("percentile(value, 0.5)")) >>>> Py4JError: An error occurred while calling o97.agg. Trace: >>>> py4j.Py4JException: Method agg([class java.lang.String, class >>>> scala.collection.immutable.Nil$]) does not exist >>>> at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:333) >>>> >>>> Any idea ? >>>> >>>> Olivier. >>>> Le mar. 2 juin 2015 à 18:02, Holden Karau <hol...@pigscanfly.ca> a >>>> écrit : >>>> >>>>> Not super easily, the GroupedData class uses a strToExpr function >>>>> which has a pretty limited set of functions so we cant pass in the name of >>>>> an arbitrary hive UDAF (unless I'm missing something). We can instead >>>>> construct an column with the expression you want and then pass it in to >>>>> agg() that way (although then you need to call the hive UDAF there). There >>>>> are some private classes in hiveUdfs.scala which expose hiveUdaf's as >>>>> Spark >>>>> SQL AggregateExpressions, but they are private. >>>>> >>>>> On Tue, Jun 2, 2015 at 8:28 AM, Olivier Girardot < >>>>> o.girar...@lateral-thoughts.com> wrote: >>>>> >>>>>> I've finally come to the same conclusion, but isn't there any way to >>>>>> call this Hive UDAFs from the agg("percentile(key,0.5)") ?? >>>>>> >>>>>> Le mar. 2 juin 2015 à 15:37, Yana Kadiyska <yana.kadiy...@gmail.com> >>>>>> a écrit : >>>>>> >>>>>>> Like this...sqlContext should be a HiveContext instance >>>>>>> >>>>>>> case class KeyValue(key: Int, value: String) >>>>>>> val df=sc.parallelize(1 to 50).map(i=>KeyValue(i, i.toString)).toDF >>>>>>> df.registerTempTable("table") >>>>>>> sqlContext.sql("select percentile(key,0.5) from table").show() >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Tue, Jun 2, 2015 at 8:07 AM, Olivier Girardot < >>>>>>> o.girar...@lateral-thoughts.com> wrote: >>>>>>> >>>>>>>> Hi everyone, >>>>>>>> Is there any way to compute a median on a column using Spark's >>>>>>>> Dataframe. I know you can use stats in a RDD but I'd rather stay >>>>>>>> within a >>>>>>>> dataframe. >>>>>>>> Hive seems to imply that using ntile one can compute percentiles, >>>>>>>> quartiles and therefore a median. >>>>>>>> Does anyone have experience with this ? >>>>>>>> >>>>>>>> Regards, >>>>>>>> >>>>>>>> Olivier. >>>>>>>> >>>>>>> >>>>>>> >>>>> >>>>> >>>>> -- >>>>> Cell : 425-233-8271 >>>>> Twitter: https://twitter.com/holdenkarau >>>>> Linked In: https://www.linkedin.com/in/holdenkarau >>>>> >>>> >>> >>> -- >>> Cell : 425-233-8271 >>> Twitter: https://twitter.com/holdenkarau >>> Linked In: https://www.linkedin.com/in/holdenkarau >>> >>> >> > > > -- > Cell : 425-233-8271 > Twitter: https://twitter.com/holdenkarau > Linked In: https://www.linkedin.com/in/holdenkarau >