Hi Spark folks, Is there any plan to support the richer UDF API that Hive supports for Spark UDFs ? Hive supports the GenericUDF API which has, among others methods like initialize(), configure() (called once on the cluster) etc, which a lot of our users use. We have now a lot of UDFs in Hive which make use of these methods. We plan to move to UDFs to Spark UDFs but are being limited by not having similar lifecycle methods. Are there plans to address these? Or do people usually adopt some sort of workaround?
If we directly use the Hive UDFs in Spark we pay a performance penalty. I think Spark anyways does a conversion from InternalRow to Row back to InternalRow for native spark udfs and for Hive it does InternalRow to Hive Object back to InternalRow but somehow the conversion in native udfs is more performant. -Best, R.