[ https://issues.apache.org/jira/browse/SPARK-28978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xiangrui Meng resolved SPARK-28978. ----------------------------------- Fix Version/s: 3.0.0 Assignee: Bago Amirbekian Resolution: Fixed > PySpark: Can't pass more than 256 arguments to a UDF > ---------------------------------------------------- > > Key: SPARK-28978 > URL: https://issues.apache.org/jira/browse/SPARK-28978 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 2.3.2, 2.4.0, 2.4.4 > Reporter: Jim Fulton > Assignee: Bago Amirbekian > Priority: Major > Labels: koalas, mlflow, pyspark > Fix For: 3.0.0 > > > This code: > [https://github.com/apache/spark/blob/712874fa0937f0784f47740b127c3bab20da8569/python/pyspark/worker.py#L367-L379] > Creates Python lambdas that call UDF functions passing arguments singly, > rather than using varargs. For example: `lambda a: f(a[0], a[1], ...)`. > This fails when there are more than 256 arguments. > mlflow, when generating model predictions, uses an argument for each feature > column. I have a model with > 500 features. > I was able to easily hack around this by changing the generated lambdas to > use varargs, as in `lambda a: f(*a)`. > IDK why these lambdas were created the way they were. Using varargs is much > simpler and works fine in my testing. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org