On Sat, Dec 5, 2015 at 3:27 PM, Deenar Toraskar <deenar.toras...@gmail.com> wrote: > > On a similar note, what is involved in getting native support for some > user defined functions, so that they are as efficient as native Spark SQL > expressions? I had one particular one - an arraySum (element wise sum) that > is heavily used in a lot of risk analytics. >
To get the best performance you have to implement a catalyst expression with codegen. This however is necessarily an internal (unstable) interface since we are constantly making breaking changes to improve performance. So if its a common enough operation we should bake it into the engine. That said, the code generated encoders that we created for datasets should lower the cost of calling into external functions as we start using them in more and more places (i.e. https://issues.apache.org/jira/browse/SPARK-11593 )