It’s because the class in which you have defined the udf is not serializable. Declare the udf in a class and make the class seriablizable.
From: shyla deshpande [mailto:deshpandesh...@gmail.com] Sent: Thursday, June 01, 2017 10:08 AM To: user Subject: Spark sql with Zeppelin, Task not serializable error when I try to cache the spark sql table Hello all, I am using Zeppelin 0.7.1 with Spark 2.1.0 I am getting org.apache.spark.SparkException: Task not serializable error when I try to cache the spark sql table. I am using a UDF on a column of table and want to cache the resultant table . I can execute the paragraph successfully when there is no caching. Please help! Thanks -----------Following is my code-------- UDF : def fn1(res: String): Int = { 100 } spark.udf.register("fn1", fn1(_: String): Int) spark .read .format("org.apache.spark.sql.cassandra") .options(Map("keyspace" -> "k", "table" -> "t")) .load .createOrReplaceTempView("t1") val df1 = spark.sql("SELECT col1, col2, fn1(col3) from t1" ) df1.createOrReplaceTempView("t2") spark.catalog.cacheTable("t2") DISCLAIMER ========== This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.