Hi All,

I have been working on a use case where I have a DF which has 25 columns, 24 
columns are of type string and last column is array of doubles. For a given set 
of columns I have to apply group by and add the array of doubles, I have 
implemented UDAF which works fine but it's expensive in order to tune the 
solution I came across Aggregators which can be implemented and used with agg 
function, my question is how can we implement a aggregator which takes array of 
doubles as input and returns the array of double.

I learned that it's not possible to implement the aggregator in Java can be 
done in scala only how can define the aggregator which takes array of doubles 
as input, note that I have parquet file as my input.

Any pointers are highly appreciated, I read that spark UDAF is slow and 
aggregators are the way to go.

Best Regards,
Anil Langote
+1-425-633-9747

Reply via email to