Re: Got NotSerializableException when access broadcast variable

tianyi Wed, 20 Aug 2014 23:20:07 -0700

Thanks  for help



On Aug 21, 2014, at 10:56, Yin Huai <huaiyin....@gmail.com> wrote:

> If you want to filter the table name, you can use 
> 
> hc.sql("show tables").filter(row => !"test".equals(row.getString(0))))
> 
> Seems making functionRegistry transient can fix the error.
> 
> 
> On Wed, Aug 20, 2014 at 8:53 PM, Vida Ha <v...@databricks.com> wrote:
> Hi,
> 
> I doubt the the broadcast variable is your problem, since you are seeing:
> 
> org.apache.spark.SparkException: Task not serializable
> Caused by: java.io.NotSerializableException: 
> org.apache.spark.sql.hive.HiveContext$$anon$3
> 
> We have a knowledgebase article that explains why this happens - it's a very 
> common error I see users triggering on the mailing list:
> 
> https://github.com/databricks/spark-knowledgebase/blob/master/troubleshooting/javaionotserializableexception.md
> 
> Are you using the HiveContext within a tranformation that is called on an 
> RDD?  That will definitely create a problem.
> 
> -Vida
> 
> 
> 
> 
> 
> On Wed, Aug 20, 2014 at 1:20 AM, tianyi <tia...@asiainfo.com> wrote:
> Thanks for help.
> 
> I run this script again with "bin/spark-shell --conf 
> spark.serializer=org.apache.spark.serializer.KryoSerializer”
> 
> in the console, I can see:
> 
> scala> sc.getConf.getAll.foreach(println)
> (spark.tachyonStore.folderName,spark-eaabe986-03cb-41bd-bde5-993c7db3f048)
> (spark.driver.host,10.1.51.127)
> (spark.executor.extraJavaOptions,-Dsun.io.serialization.extendedDebugInfo=true)
> (spark.serializer,org.apache.spark.serializer.KryoSerializer)
> (spark.repl.class.uri,http://10.1.51.127:51319)
> (spark.app.name,Spark shell)
> (spark.driver.extraJavaOptions,-Dsun.io.serialization.extendedDebugInfo=true)
> (spark.fileserver.uri,http://10.1.51.127:51322)
> (spark.jars,)
> (spark.driver.port,51320)
> (spark.master,local[*])
> 
> But it fails again with the same error.
> 
> 
> 
> 
> On Aug 20, 2014, at 15:59, Fengyun RAO <raofeng...@gmail.com> wrote:
> 
>> try: 
>> 
>> sparkConf.set("spark.serializer", 
>> "org.apache.spark.serializer.KryoSerializer")
>> 
>> 
>> 2014-08-20 14:27 GMT+08:00 田毅 <tia...@asiainfo.com>:
>> 
>> Hi everyone!
>> 
>> I got a exception when i run my script with spark-shell:
>> 
>> I added 
>> 
>> SPARK_JAVA_OPTS="-Dsun.io.serialization.extendedDebugInfo=true"
>> 
>> in spark-env.sh to show the following stack:
>> 
>> 
>> org.apache.spark.SparkException: Task not serializable
>>      at 
>> org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:166)
>>      at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:158)
>>      at org.apache.spark.SparkContext.clean(SparkContext.scala:1242)
>>      at org.apache.spark.rdd.RDD.filter(RDD.scala:282)
>>      at org.apache.spark.sql.SchemaRDD.filter(SchemaRDD.scala:460)
>>      at $iwC$$iwC$$iwC$$iwC.<init>(<console>:18)
>>      at $iwC$$iwC$$iwC.<init>(<console>:23)
>>      at $iwC$$iwC.<init>(<console>:25)
>>      at $iwC.<init>(<console>:27)
>>      at <init>(<console>:29)
>>      at .<init>(<console>:33)
>>      at .<clinit>(<console>)
>>      at .<init>(<console>:7)
>>      at .<clinit>(<console>)
>>      at $print(<console>)
>>      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>      at 
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>      at 
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>      at java.lang.reflect.Method.invoke(Method.java:601)
>>      at 
>> org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:789)
>>      at 
>> org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1062)
>> ……
>> Caused by: java.io.NotSerializableException: 
>> org.apache.spark.sql.hive.HiveContext$$anon$3
>>      - field (class "org.apache.spark.sql.hive.HiveContext", name: 
>> "functionRegistry", type: "class 
>> org.apache.spark.sql.hive.HiveFunctionRegistry")
>>      - object (class "org.apache.spark.sql.hive.HiveContext", 
>> org.apache.spark.sql.hive.HiveContext@4648e685)
>>      - field (class "$iwC$$iwC$$iwC$$iwC", name: "hc", type: "class 
>> org.apache.spark.sql.hive.HiveContext")
>>      - object (class "$iwC$$iwC$$iwC$$iwC", $iwC$$iwC$$iwC$$iwC@23d652ef)
>>      - field (class "$iwC$$iwC$$iwC", name: "$iw", type: "class 
>> $iwC$$iwC$$iwC$$iwC")
>>      - object (class "$iwC$$iwC$$iwC", $iwC$$iwC$$iwC@71cc14f1)
>>      - field (class "$iwC$$iwC", name: "$iw", type: "class $iwC$$iwC$$iwC")
>>      - object (class "$iwC$$iwC", $iwC$$iwC@74eca89e)
>>      - field (class "$iwC", name: "$iw", type: "class $iwC$$iwC")
>>      - object (class "$iwC", $iwC@685c4cc4)
>>      - field (class "$line9.$read", name: "$iw", type: "class $iwC")
>>      - object (class "$line9.$read", $line9.$read@519f9aae)
>>      - field (class "$iwC$$iwC$$iwC", name: "$VAL7", type: "class 
>> $line9.$read")
>>      - object (class "$iwC$$iwC$$iwC", $iwC$$iwC$$iwC@4b996858)
>>      - field (class "$iwC$$iwC$$iwC$$iwC", name: "$outer", type: "class 
>> $iwC$$iwC$$iwC")
>>      - object (class "$iwC$$iwC$$iwC$$iwC", $iwC$$iwC$$iwC$$iwC@31d646d4)
>>      - field (class "$iwC$$iwC$$iwC$$iwC$$anonfun$1", name: "$outer", type: 
>> "class $iwC$$iwC$$iwC$$iwC")
>>      - root object (class "$iwC$$iwC$$iwC$$iwC$$anonfun$1", <function1>)
>>      at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
>>      at 
>> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1528)
>> 
>> I write some simple script to reproduce this problem.
>> 
>> case 1 :
>>     val barr1 = sc.broadcast("test")
>>     val sret = sc.parallelize(1 to 10, 2)
>>     val ret = sret.filter(row => !barr1.equals("test"))
>>     ret.collect.foreach(println)
>> 
>> It’s working fine with local mode and yarn-client mode.
>> 
>> case 2 :
>>     val barr1 = sc.broadcast("test")
>>     val hc = new org.apache.spark.sql.hive.HiveContext(sc)
>>     val sret = hc.sql("show tables")
>>     val ret = sret.filter(row => !barr1.equals("test"))
>>     ret.collect.foreach(println)
>> 
>> It will throw java.io.NotSerializableException: 
>> org.apache.spark.sql.hive.HiveContext
>>  with local mode and yarn-client mode
>> 
>> But it working fine if I write the same code in a scala file and run in 
>> Intellij IDEA.
>> 
>> import org.apache.spark.{SparkConf, SparkContext}
>> 
>> object TestBroadcast2 {
>>   def main(args: Array[String]) {
>>     val sparkConf = new SparkConf().setAppName("Broadcast 
>> Test").setMaster("local[3]")
>>     val sc = new SparkContext(sparkConf)
>>     val barr1 = sc.broadcast("test")
>>     val hc = new org.apache.spark.sql.hive.HiveContext(sc)
>>     val sret = hc.sql("show tables")
>>     val ret = sret.filter(row => !barr1.equals("test"))
>>     ret.collect.foreach(println)
>>   }
>> }
>> 
>> 
>> 
>> 
>> 
>> 
> 
> 
>

Re: Got NotSerializableException when access broadcast variable

Reply via email to