Yes, but I don’t want to use it in a select() call. Either selectExpr() or spark.sql(), with the udf being called inside a string.
Now I got it to work using "sqlContext.registerFunction('encodeOneHot_udf',encodeOneHot, VectorUDT())” But this sqlContext approach will disappear, right? So I’m curious what to use instead. > On Aug 4, 2016, at 3:54 PM, Nicholas Chammas <nicholas.cham...@gmail.com> > wrote: > > Have you looked at pyspark.sql.functions.udf and the associated examples? > 2016년 8월 4일 (목) 오전 9:10, Ben Teeuwen <bteeu...@gmail.com > <mailto:bteeu...@gmail.com>>님이 작성: > Hi, > > I’d like to use a UDF in pyspark 2.0. As in .. > ________ > > def squareIt(x): > return x * x > > # register the function and define return type > …. > > spark.sql(“”"select myUdf(adgroupid, 'extra_string_parameter') as > function_result from df’) > > _________ > > How can I register the function? I only see registerFunction in the > deprecated sqlContext at > http://spark.apache.org/docs/2.0.0/api/python/pyspark.sql.html > <http://spark.apache.org/docs/2.0.0/api/python/pyspark.sql.html>. > As the ‘spark’ object unifies hiveContext and sqlContext, what is the new way > to go? > > Ben