[Spark Sql/ UDFs] Spark and Hive UDFs parity

RD Wed, 14 Jun 2017 21:52:55 -0700

Hi Spark folks,

    Is there any plan to support the richer UDF API that Hive supports for
Spark UDFs ? Hive supports the GenericUDF API which has, among others
methods like initialize(), configure() (called once on the cluster) etc,
which a lot of our users use. We have now a lot of UDFs in Hive which make
use of these methods. We plan to move to UDFs to Spark UDFs but are being
limited by not having similar lifecycle methods.
   Are there plans to address these? Or do people usually adopt some sort
of workaround?


   If we  directly use  the Hive UDFs  in Spark we pay a performance
penalty. I think Spark anyways does a conversion from InternalRow to Row
back to InternalRow for native spark udfs and for Hive it does InternalRow
to Hive Object back to InternalRow but somehow the conversion in native
udfs is more performant.

-Best,
R.

[Spark Sql/ UDFs] Spark and Hive UDFs parity

Reply via email to