Dear Spark users,

Given a DataFrame df with a column named foo bar, I can call a Spark SQL
built-in function on it like so:

df.select(functions.max(df("foo bar")))

However, if I want to apply a Hive UDF named myCustomFunction, I need to
write

df.selectExpr("myCustomFunction(`foo bar`)")

which forces me to deal with escaping the name of the column so I can put
it inside a well-formed SQL query. Is there a programmatic way to invoke a
Hive function by name, so that I don’t have to worry about escaping?
Ideally, I’d like to do something like

val myCustomFunction = functions.udf("myCustomFunction")
df.select(myCustomFunction(df("foo bar")))

… but I couldn’t find any such API.

Regards,

Punya

Reply via email to