You have to use SQL to call it (but you will be able to do it with dataframes in Spark 2.0 due to a better parser). You need to construct a struct(*) and then pass that to your function since a function must have a fixed number of arguments.
Here is an example <https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/1023043053387187/2457334174245122/2840265927289860/b29d1ad2aa.html> On Fri, Mar 4, 2016 at 6:41 AM, Nisrina Luthfiyati < nisrina.luthfiy...@gmail.com> wrote: > Hi all, > I'm using spark sql in python and want to write a udf that takes an entire > Row as the argument. > I tried something like: > > def functionName(row): > ... > return a_string > > udfFunctionName=udf(functionName, StringType()) > df.withColumn('columnName', udfFunctionName('*')) > > but this gives an error message: > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > File > "/home/nina/Downloads/spark-1.6.0-bin-hadoop2.6/python/pyspark/sql/dataframe.py", > line 1311, in withColumn > return DataFrame(self._jdf.withColumn(colName, col._jc), self.sql_ctx) > File > "/home/nina/Downloads/spark-1.6.0-bin-hadoop2.6/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", > line 813, in __call__ > File > "/home/nina/Downloads/spark-1.6.0-bin-hadoop2.6/python/pyspark/sql/utils.py", > line 51, in deco > raise AnalysisException(s.split(': ', 1)[1], stackTrace) > pyspark.sql.utils.AnalysisException: u"unresolved operator 'Project > [address#0,name#1,PythonUDF#functionName(*) AS columnName#26];" > > Does anyone know how this can be done or whether this is possible? > > Thank you, > Nisrina. > >