It’s because the class in which you have defined the udf is not serializable.
Declare the udf in a class and make the class seriablizable.

From: shyla deshpande [mailto:deshpandesh...@gmail.com]
Sent: Thursday, June 01, 2017 10:08 AM
To: user
Subject: Spark sql with Zeppelin, Task not serializable error when I try to 
cache the spark sql table

Hello all,

I am using Zeppelin 0.7.1 with Spark 2.1.0

I am getting org.apache.spark.SparkException: Task not serializable error when 
I try to cache the spark sql table. I am using a UDF on a column of table and 
want to cache the resultant table . I can execute the paragraph successfully 
when there is no caching.

Please help! Thanks
-----------Following is my code--------
UDF :
def fn1(res: String): Int = {
      100
    }
 spark.udf.register("fn1", fn1(_: String): Int)


       spark
      .read
      .format("org.apache.spark.sql.cassandra")
      .options(Map("keyspace" -> "k", "table" -> "t"))
      .load
      .createOrReplaceTempView("t1")


     val df1 = spark.sql("SELECT  col1, col2, fn1(col3)   from t1" )

     df1.createOrReplaceTempView("t2")

   spark.catalog.cacheTable("t2")

DISCLAIMER
==========
This e-mail may contain privileged and confidential information which is the 
property of Persistent Systems Ltd. It is intended only for the use of the 
individual or entity to which it is addressed. If you are not the intended 
recipient, you are not authorized to read, retain, copy, print, distribute or 
use this message. If you have received this communication in error, please 
notify the sender and delete all copies of this message. Persistent Systems 
Ltd. does not accept any liability for virus infected mails.

Reply via email to