Re: Use Spark Aggregator in PySpark

Enrico Minack Mon, 24 Apr 2023 01:58:14 -0700

Hi,

For an aggregating UDF, use spark.udf.registerJavaUDAF(name, className).


Enrico



Am 23.04.23 um 23:42 schrieb Thomas Wang:

Hi Spark Community,
I have implemented a custom Spark Aggregator (a subclass to|org.apache.spark.sql.expressions.Aggregator|). Now I'm trying to useit in a PySpark application, but for some reason, I'm not able totrigger the function. Here is what I'm doing, could someone help metake a look? Thanks.
spark = self._gen_spark_session()
spark.udf.registerJavaFunction(
name="MyAggrator",
javaClassName="my.package.MyAggrator",
returnType=ArrayType(elementType=LongType()),
)
The above code runs successfully. However, to call it, I assume Ishould do something like the following.
df = df.groupBy().agg(
functions.expr("MyAggrator(input)").alias("output"),
)

But this one gives me the following error:

pyspark.sql.utils.AnalysisException: UDF class my.package.MyAggrator doesn't 
implement any UDF interface
My question is how can I use the Spark Aggregator defined in a jarfile in PySpark? Thanks.
Thomas

Re: Use Spark Aggregator in PySpark

Reply via email to