Re: Spark SQL - udf with entire row as parameter

Michael Armbrust Fri, 04 Mar 2016 13:10:32 -0800

You have to use SQL to call it (but you will be able to do it with
dataframes in Spark 2.0 due to a better parser).  You need to construct a
struct(*) and then pass that to your function since a function must have a
fixed number of arguments.


Here is an example
<https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/1023043053387187/2457334174245122/2840265927289860/b29d1ad2aa.html>

On Fri, Mar 4, 2016 at 6:41 AM, Nisrina Luthfiyati <
nisrina.luthfiy...@gmail.com> wrote:

> Hi all,
> I'm using spark sql in python and want to write a udf that takes an entire
> Row as the argument.
> I tried something like:
>
> def functionName(row):
>     ...
>     return a_string
>
> udfFunctionName=udf(functionName, StringType())
> df.withColumn('columnName', udfFunctionName('*'))
>
> but this gives an error message:
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File
> "/home/nina/Downloads/spark-1.6.0-bin-hadoop2.6/python/pyspark/sql/dataframe.py",
> line 1311, in withColumn
>     return DataFrame(self._jdf.withColumn(colName, col._jc), self.sql_ctx)
>   File
> "/home/nina/Downloads/spark-1.6.0-bin-hadoop2.6/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py",
> line 813, in __call__
>   File
> "/home/nina/Downloads/spark-1.6.0-bin-hadoop2.6/python/pyspark/sql/utils.py",
> line 51, in deco
>     raise AnalysisException(s.split(': ', 1)[1], stackTrace)
> pyspark.sql.utils.AnalysisException: u"unresolved operator 'Project
> [address#0,name#1,PythonUDF#functionName(*) AS columnName#26];"
>
> Does anyone know how this can be done or whether this is possible?
>
> Thank you,
> Nisrina.
>
>

Re: Spark SQL - udf with entire row as parameter

Reply via email to