Thanks for help
On Aug 21, 2014, at 10:56, Yin Huai <huaiyin....@gmail.com> wrote: > If you want to filter the table name, you can use > > hc.sql("show tables").filter(row => !"test".equals(row.getString(0)))) > > Seems making functionRegistry transient can fix the error. > > > On Wed, Aug 20, 2014 at 8:53 PM, Vida Ha <v...@databricks.com> wrote: > Hi, > > I doubt the the broadcast variable is your problem, since you are seeing: > > org.apache.spark.SparkException: Task not serializable > Caused by: java.io.NotSerializableException: > org.apache.spark.sql.hive.HiveContext$$anon$3 > > We have a knowledgebase article that explains why this happens - it's a very > common error I see users triggering on the mailing list: > > https://github.com/databricks/spark-knowledgebase/blob/master/troubleshooting/javaionotserializableexception.md > > Are you using the HiveContext within a tranformation that is called on an > RDD? That will definitely create a problem. > > -Vida > > > > > > On Wed, Aug 20, 2014 at 1:20 AM, tianyi <tia...@asiainfo.com> wrote: > Thanks for help. > > I run this script again with "bin/spark-shell --conf > spark.serializer=org.apache.spark.serializer.KryoSerializer” > > in the console, I can see: > > scala> sc.getConf.getAll.foreach(println) > (spark.tachyonStore.folderName,spark-eaabe986-03cb-41bd-bde5-993c7db3f048) > (spark.driver.host,10.1.51.127) > (spark.executor.extraJavaOptions,-Dsun.io.serialization.extendedDebugInfo=true) > (spark.serializer,org.apache.spark.serializer.KryoSerializer) > (spark.repl.class.uri,http://10.1.51.127:51319) > (spark.app.name,Spark shell) > (spark.driver.extraJavaOptions,-Dsun.io.serialization.extendedDebugInfo=true) > (spark.fileserver.uri,http://10.1.51.127:51322) > (spark.jars,) > (spark.driver.port,51320) > (spark.master,local[*]) > > But it fails again with the same error. > > > > > On Aug 20, 2014, at 15:59, Fengyun RAO <raofeng...@gmail.com> wrote: > >> try: >> >> sparkConf.set("spark.serializer", >> "org.apache.spark.serializer.KryoSerializer") >> >> >> 2014-08-20 14:27 GMT+08:00 田毅 <tia...@asiainfo.com>: >> >> Hi everyone! >> >> I got a exception when i run my script with spark-shell: >> >> I added >> >> SPARK_JAVA_OPTS="-Dsun.io.serialization.extendedDebugInfo=true" >> >> in spark-env.sh to show the following stack: >> >> >> org.apache.spark.SparkException: Task not serializable >> at >> org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:166) >> at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:158) >> at org.apache.spark.SparkContext.clean(SparkContext.scala:1242) >> at org.apache.spark.rdd.RDD.filter(RDD.scala:282) >> at org.apache.spark.sql.SchemaRDD.filter(SchemaRDD.scala:460) >> at $iwC$$iwC$$iwC$$iwC.<init>(<console>:18) >> at $iwC$$iwC$$iwC.<init>(<console>:23) >> at $iwC$$iwC.<init>(<console>:25) >> at $iwC.<init>(<console>:27) >> at <init>(<console>:29) >> at .<init>(<console>:33) >> at .<clinit>(<console>) >> at .<init>(<console>:7) >> at .<clinit>(<console>) >> at $print(<console>) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> at java.lang.reflect.Method.invoke(Method.java:601) >> at >> org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:789) >> at >> org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1062) >> …… >> Caused by: java.io.NotSerializableException: >> org.apache.spark.sql.hive.HiveContext$$anon$3 >> - field (class "org.apache.spark.sql.hive.HiveContext", name: >> "functionRegistry", type: "class >> org.apache.spark.sql.hive.HiveFunctionRegistry") >> - object (class "org.apache.spark.sql.hive.HiveContext", >> org.apache.spark.sql.hive.HiveContext@4648e685) >> - field (class "$iwC$$iwC$$iwC$$iwC", name: "hc", type: "class >> org.apache.spark.sql.hive.HiveContext") >> - object (class "$iwC$$iwC$$iwC$$iwC", $iwC$$iwC$$iwC$$iwC@23d652ef) >> - field (class "$iwC$$iwC$$iwC", name: "$iw", type: "class >> $iwC$$iwC$$iwC$$iwC") >> - object (class "$iwC$$iwC$$iwC", $iwC$$iwC$$iwC@71cc14f1) >> - field (class "$iwC$$iwC", name: "$iw", type: "class $iwC$$iwC$$iwC") >> - object (class "$iwC$$iwC", $iwC$$iwC@74eca89e) >> - field (class "$iwC", name: "$iw", type: "class $iwC$$iwC") >> - object (class "$iwC", $iwC@685c4cc4) >> - field (class "$line9.$read", name: "$iw", type: "class $iwC") >> - object (class "$line9.$read", $line9.$read@519f9aae) >> - field (class "$iwC$$iwC$$iwC", name: "$VAL7", type: "class >> $line9.$read") >> - object (class "$iwC$$iwC$$iwC", $iwC$$iwC$$iwC@4b996858) >> - field (class "$iwC$$iwC$$iwC$$iwC", name: "$outer", type: "class >> $iwC$$iwC$$iwC") >> - object (class "$iwC$$iwC$$iwC$$iwC", $iwC$$iwC$$iwC$$iwC@31d646d4) >> - field (class "$iwC$$iwC$$iwC$$iwC$$anonfun$1", name: "$outer", type: >> "class $iwC$$iwC$$iwC$$iwC") >> - root object (class "$iwC$$iwC$$iwC$$iwC$$anonfun$1", <function1>) >> at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) >> at >> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1528) >> >> I write some simple script to reproduce this problem. >> >> case 1 : >> val barr1 = sc.broadcast("test") >> val sret = sc.parallelize(1 to 10, 2) >> val ret = sret.filter(row => !barr1.equals("test")) >> ret.collect.foreach(println) >> >> It’s working fine with local mode and yarn-client mode. >> >> case 2 : >> val barr1 = sc.broadcast("test") >> val hc = new org.apache.spark.sql.hive.HiveContext(sc) >> val sret = hc.sql("show tables") >> val ret = sret.filter(row => !barr1.equals("test")) >> ret.collect.foreach(println) >> >> It will throw java.io.NotSerializableException: >> org.apache.spark.sql.hive.HiveContext >> with local mode and yarn-client mode >> >> But it working fine if I write the same code in a scala file and run in >> Intellij IDEA. >> >> import org.apache.spark.{SparkConf, SparkContext} >> >> object TestBroadcast2 { >> def main(args: Array[String]) { >> val sparkConf = new SparkConf().setAppName("Broadcast >> Test").setMaster("local[3]") >> val sc = new SparkContext(sparkConf) >> val barr1 = sc.broadcast("test") >> val hc = new org.apache.spark.sql.hive.HiveContext(sc) >> val sret = hc.sql("show tables") >> val ret = sret.filter(row => !barr1.equals("test")) >> ret.collect.foreach(println) >> } >> } >> >> >> >> >> >> > > >