Use Spark Aggregator in PySpark

2023-04-23 Thread Thomas Wang
Hi Spark Community, I have implemented a custom Spark Aggregator (a subclass to org.apache.spark.sql.expressions.Aggregator). Now I'm trying to use it in a PySpark application, but for some reason, I'm not able to trigger the function. Here is what I'm doing, could someone help me take a look?

Re: Spark Aggregator with ARRAY input and ARRAY output

2023-04-23 Thread Thomas Wang
Thanks Raghavendra, Could you be more specific about how I can use ExpressionEncoder()? More specifically, how can I conform to the return type of Encoder>? Thomas On Sun, Apr 23, 2023 at 9:42 AM Raghavendra Ganesh wrote: > For simple array types setting encoder to ExpressionEncoder() should

Re: Spark Aggregator with ARRAY input and ARRAY output

2023-04-23 Thread Raghavendra Ganesh
For simple array types setting encoder to ExpressionEncoder() should work. -- Raghavendra On Sun, Apr 23, 2023 at 9:20 PM Thomas Wang wrote: > Hi Spark Community, > > I'm trying to implement a custom Spark Aggregator (a subclass to > org.apache.spark.sql.expressions.Aggregator). Correct me if

Spark Aggregator with ARRAY input and ARRAY output

2023-04-23 Thread Thomas Wang
Hi Spark Community, I'm trying to implement a custom Spark Aggregator (a subclass to org.apache.spark.sql.expressions.Aggregator). Correct me if I'm wrong, but I'm assuming I will be able to use it as an aggregation function like SUM. What I'm trying to do is that I have a column of ARRAY and I

State of GraphX and GraphFrames

2023-04-23 Thread g
Hello, I am currently doing my Master thesis on data provenance on Apache Spark and would like to extend the provenance capabilities to include GraphX/GraphFrames. I am curious what the current status of both GraphX and GraphFrames is. It seems that GraphX is no longer being updated (but still