Re: Dataset and lambas

Michael Armbrust Mon, 07 Dec 2015 12:22:25 -0800

On Sat, Dec 5, 2015 at 3:27 PM, Deenar Toraskar <deenar.toras...@gmail.com>
wrote:
>
> On a similar note, what is involved in getting native support for some
> user defined functions, so that they are as efficient as native Spark SQL
> expressions? I had one particular one - an arraySum (element wise sum) that
> is heavily used in a lot of risk analytics.
>


To get the best performance you have to implement a catalyst expression
with codegen.  This however is necessarily an internal (unstable) interface
since we are constantly making breaking changes to improve performance.  So
if its a common enough operation we should bake it into the engine.

That said, the code generated encoders that we created for datasets should
lower the cost of calling into external functions as we start using them in
more and more places (i.e. https://issues.apache.org/jira/browse/SPARK-11593
)

Re: Dataset and lambas

Reply via email to