Michael Having VectorUnionSumUDAF implemented would be great. This is quite generic, it does element-wise sum of arrays and maps https://github.com/klout/brickhouse/blob/master/src/main/java/brickhouse/udf/timeseries/VectorUnionSumUDAF.java and would be massive benefit for a lot of risk analytics.
In general most of the brickhouse UDFs are quite useful https://github.com/klout/brickhouse. Happy to help out. On another note what would be involved to have arrays backed by a sparse Array (I am assuming the current implementation is dense), sort of native support for http://spark.apache.org/docs/latest/mllib-data-types.html Regards Deenar Regards Deenar On 7 December 2015 at 20:21, Michael Armbrust <mich...@databricks.com> wrote: > On Sat, Dec 5, 2015 at 3:27 PM, Deenar Toraskar <deenar.toras...@gmail.com > > wrote: >> >> On a similar note, what is involved in getting native support for some >> user defined functions, so that they are as efficient as native Spark SQL >> expressions? I had one particular one - an arraySum (element wise sum) that >> is heavily used in a lot of risk analytics. >> > > To get the best performance you have to implement a catalyst expression > with codegen. This however is necessarily an internal (unstable) interface > since we are constantly making breaking changes to improve performance. So > if its a common enough operation we should bake it into the engine. > > That said, the code generated encoders that we created for datasets should > lower the cost of calling into external functions as we start using them in > more and more places (i.e. > https://issues.apache.org/jira/browse/SPARK-11593) >