Many thanks, will look into this. I dont want to particularly reuse the
custom Hive UDAF I have, would prefer writing a new one it that is
cleaner.  I am just using the JVM.

On 5 June 2015 at 00:03, Holden Karau <hol...@pigscanfly.ca> wrote:

> My current example doesn't use a Hive UDAF, but you would  do something
> pretty similar (it calls a new user defined UDAF, and there are wrappers to
> make Spark SQL UDAFs from Hive UDAFs but they are private). So this is
> doable, but since it pokes at internals it will likely break between
> versions of Spark. If you want to see the WIP PR I have with Sparkling
> Pandas its at
> https://github.com/sparklingpandas/sparklingpandas/pull/90/files . If
> your doing this in JVM and just want to know how to wrap the Hive UDAF, you
> can grep/look in sql/hive/ in Spark, but I'd encourage you to see if there
> is another way to accomplish what you want (since poking at the internals
> is kind of dangerous).
>
> On Thu, Jun 4, 2015 at 6:28 AM, Deenar Toraskar <deenar.toras...@gmail.com
> > wrote:
>
>> Hi Holden, Olivier
>>
>>
>> >>So for column you need to pass in a Java function, I have some sample
>> code which does this but it does terrible things to access Spark internals.
>> I also need to call a Hive UDAF in a dataframe agg function. Are there
>> any examples of what Column expects?
>>
>> Deenar
>>
>> On 2 June 2015 at 21:13, Holden Karau <hol...@pigscanfly.ca> wrote:
>>
>>> So for column you need to pass in a Java function, I have some sample
>>> code which does this but it does terrible things to access Spark internals.
>>>
>>>
>>> On Tuesday, June 2, 2015, Olivier Girardot <
>>> o.girar...@lateral-thoughts.com> wrote:
>>>
>>>> Nice to hear from you Holden ! I ended up trying exactly that (Column)
>>>> - but I may have done it wrong :
>>>>
>>>> In [*5*]: g.agg(Column("percentile(value, 0.5)"))
>>>> Py4JError: An error occurred while calling o97.agg. Trace:
>>>> py4j.Py4JException: Method agg([class java.lang.String, class
>>>> scala.collection.immutable.Nil$]) does not exist
>>>> at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:333)
>>>>
>>>> Any idea ?
>>>>
>>>> Olivier.
>>>> Le mar. 2 juin 2015 à 18:02, Holden Karau <hol...@pigscanfly.ca> a
>>>> écrit :
>>>>
>>>>> Not super easily, the GroupedData class uses a strToExpr function
>>>>> which has a pretty limited set of functions so we cant pass in the name of
>>>>> an arbitrary hive UDAF (unless I'm missing something). We can instead
>>>>> construct an column with the expression you want and then pass it in to
>>>>> agg() that way (although then you need to call the hive UDAF there). There
>>>>> are some private classes in hiveUdfs.scala which expose hiveUdaf's as 
>>>>> Spark
>>>>> SQL AggregateExpressions, but they are private.
>>>>>
>>>>> On Tue, Jun 2, 2015 at 8:28 AM, Olivier Girardot <
>>>>> o.girar...@lateral-thoughts.com> wrote:
>>>>>
>>>>>> I've finally come to the same conclusion, but isn't there any way to
>>>>>> call this Hive UDAFs from the agg("percentile(key,0.5)") ??
>>>>>>
>>>>>> Le mar. 2 juin 2015 à 15:37, Yana Kadiyska <yana.kadiy...@gmail.com>
>>>>>> a écrit :
>>>>>>
>>>>>>> Like this...sqlContext should be a HiveContext instance
>>>>>>>
>>>>>>> case class KeyValue(key: Int, value: String)
>>>>>>> val df=sc.parallelize(1 to 50).map(i=>KeyValue(i, i.toString)).toDF
>>>>>>> df.registerTempTable("table")
>>>>>>> sqlContext.sql("select percentile(key,0.5) from table").show()
>>>>>>>
>>>>>>> ​
>>>>>>>
>>>>>>> On Tue, Jun 2, 2015 at 8:07 AM, Olivier Girardot <
>>>>>>> o.girar...@lateral-thoughts.com> wrote:
>>>>>>>
>>>>>>>> Hi everyone,
>>>>>>>> Is there any way to compute a median on a column using Spark's
>>>>>>>> Dataframe. I know you can use stats in a RDD but I'd rather stay 
>>>>>>>> within a
>>>>>>>> dataframe.
>>>>>>>> Hive seems to imply that using ntile one can compute percentiles,
>>>>>>>> quartiles and therefore a median.
>>>>>>>> Does anyone have experience with this ?
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>>
>>>>>>>> Olivier.
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Cell : 425-233-8271
>>>>> Twitter: https://twitter.com/holdenkarau
>>>>> Linked In: https://www.linkedin.com/in/holdenkarau
>>>>>
>>>>
>>>
>>> --
>>> Cell : 425-233-8271
>>> Twitter: https://twitter.com/holdenkarau
>>> Linked In: https://www.linkedin.com/in/holdenkarau
>>>
>>>
>>
>
>
> --
> Cell : 425-233-8271
> Twitter: https://twitter.com/holdenkarau
> Linked In: https://www.linkedin.com/in/holdenkarau
>

Reply via email to