You're using "hadoopConf", a Configuration object, in your closure. That type is not serializable.
You can use " -Dsun.io.serialization.extendedDebugInfo=true" to debug serialization issues. On Wed, Sep 10, 2014 at 8:23 AM, Sarath Chandra <sarathchandra.jos...@algofusiontech.com> wrote: > Thanks Sean. > Please find attached my code. Let me know your suggestions/ideas. > > Regards, > Sarath > > On Wed, Sep 10, 2014 at 8:05 PM, Sean Owen <so...@cloudera.com> wrote: >> >> You mention that you are creating a UserGroupInformation inside your >> function, but something is still serializing it. You should show your >> code since it may not be doing what you think. >> >> If you instantiate an object, it happens every time your function is >> called. map() is called once per data element; mapPartitions() once >> per partition. It depends. >> >> On Wed, Sep 10, 2014 at 3:25 PM, Sarath Chandra >> <sarathchandra.jos...@algofusiontech.com> wrote: >> > Hi Sean, >> > >> > The solution of instantiating the non-serializable class inside the map >> > is >> > working fine, but I hit a road block. The solution is not working for >> > singleton classes like UserGroupInformation. >> > >> > In my logic as part of processing a HDFS file, I need to refer to some >> > reference files which are again available in HDFS. So inside the map >> > method >> > I'm trying to instantiate UserGroupInformation to get an instance of >> > FileSystem. Then using this FileSystem instance I read those reference >> > files >> > and use that data in my processing logic. >> > >> > This is throwing task not serializable exceptions for >> > 'UserGroupInformation' >> > and 'FileSystem' classes. I also tried using 'SparkHadoopUtil' instead >> > of >> > 'UserGroupInformation'. But it didn't resolve the issue. >> > >> > Request you provide some pointers in this regard. >> > >> > Also I have a query - when we instantiate a class inside map method, >> > does it >> > create a new instance for every RDD it is processing? >> > >> > Thanks & Regards, >> > Sarath >> > >> > On Sat, Sep 6, 2014 at 4:32 PM, Sean Owen <so...@cloudera.com> wrote: >> >> >> >> I disagree that the generally right change is to try to make the >> >> classes serializable. Usually, classes that are not serializable are >> >> not supposed to be serialized. You're using them in a way that's >> >> causing them to be serialized, and that's probably not desired. >> >> >> >> For example, this is wrong: >> >> >> >> val foo: SomeUnserializableManagerClass = ... >> >> rdd.map(d => foo.bar(d)) >> >> >> >> This is right: >> >> >> >> rdd.map { d => >> >> val foo: SomeUnserializableManagerClass = ... >> >> foo.bar(d) >> >> } >> >> >> >> In the first instance, you create the object on the driver and try to >> >> serialize and copy it to workers. In the second, you're creating >> >> SomeUnserializableManagerClass in the function and therefore on the >> >> worker. >> >> >> >> mapPartitions is better if this creation is expensive. >> >> >> >> On Fri, Sep 5, 2014 at 3:06 PM, Sarath Chandra >> >> <sarathchandra.jos...@algofusiontech.com> wrote: >> >> > Hi, >> >> > >> >> > I'm trying to migrate a map-reduce program to work with spark. I >> >> > migrated >> >> > the program from Java to Scala. The map-reduce program basically >> >> > loads a >> >> > HDFS file and for each line in the file it applies several >> >> > transformation >> >> > functions available in various external libraries. >> >> > >> >> > When I execute this over spark, it is throwing me "Task not >> >> > serializable" >> >> > exceptions for each and every class being used from these from >> >> > external >> >> > libraries. I included serialization to few classes which are in my >> >> > scope, >> >> > but there there are several other classes which are out of my scope >> >> > like >> >> > org.apache.hadoop.io.Text. >> >> > >> >> > How to overcome these exceptions? >> >> > >> >> > ~Sarath. >> > >> > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org -- Marcelo --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org