Considering that I have a Dataframe df , I could run df.map(operation1).map(operation2) or run df.map(logic for both operations). In addition , I could also run df.map(operation3) where operation3 would be :
return operation2(operation1()) Similarly , with UDFs, I could build a UDF that does two things or two different ones and call them sequentially. Is there any performance differences (like casting back and forth from Tungsten?) between the two? Or should I be more focused about separation of concerns than performance for this case? Thank you. -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org