Re: UDF percentile_approx

Andrés Ivaldi Wed, 14 Jun 2017 04:48:51 -0700

Hello,
Riccardo I was able to make it run, the problem is that HiveContext doesn't
exists any more in Spark 2.0.2, as far I can see. But exists the method
enableHiveSupport to add the hive functionality to SparkSession. To enable
this the spark-hive_2.11 dependency is needed.


In the Spark API Docs this is not well explained, only says that SqlContext
and HiveContext are now part of SparkSession

"SparkSession is now the new entry point of Spark that replaces the old
SQLContext and HiveContext. Note that the old SQLContext and HiveContext
are kept for backward compatibility. A new catalog interface is accessible
from SparkSession - existing API on databases and tables access such as
listTables, createExternalTable, dropTempView, cacheTable are moved here."

I think would be a good idea document enableHiveSupport also.

Thanks,

On Wed, Jun 14, 2017 at 5:13 AM, Takeshi Yamamuro <linguin....@gmail.com>
wrote:

> You can use the function w/o hive and you can try:
>
> scala> Seq(1.0, 8.0).toDF("a").selectExpr("percentile_approx(a,
> 0.5)").show
>
> +------------------------------------------------+
>
> |percentile_approx(a, CAST(0.5 AS DOUBLE), 10000)|
>
> +------------------------------------------------+
>
> |                                             8.0|
>
> +------------------------------------------------+
>
>
> // maropu
>
>
>
> On Wed, Jun 14, 2017 at 5:04 PM, Riccardo Ferrari <ferra...@gmail.com>
> wrote:
>
>> Hi Andres,
>>
>> I can't find the refrence, last time I searched for that I found that
>> 'percentile_approx' is only available via hive context. You should register
>> a temp table and use it from there.
>>
>> Best,
>>
>> On Tue, Jun 13, 2017 at 8:52 PM, Andrés Ivaldi <iaiva...@gmail.com>
>> wrote:
>>
>>> Hello, I`m trying to user percentile_approx  on my SQL query, but It's
>>> like spark context can´t find the function
>>>
>>> I'm using it like this
>>> import org.apache.spark.sql.functions._
>>> import org.apache.spark.sql.DataFrameStatFunctions
>>>
>>> val e = expr("percentile_approx(Cantidadcon0234514)")
>>> df.agg(e).show()
>>>
>>> and exception is
>>>
>>> org.apache.spark.sql.AnalysisException: Undefined function:
>>> 'percentile_approx'. This function is neither a registered temporary
>>> function nor a permanent function registered
>>>
>>> I've also tryid with callUDF
>>>
>>> Regards.
>>>
>>> --
>>> Ing. Ivaldi Andres
>>>
>>
>>
>
>
> --
> ---
> Takeshi Yamamuro
>



-- 
Ing. Ivaldi Andres

Re: UDF percentile_approx

Reply via email to