Re: Task not serializable

Marcelo Vanzin Wed, 10 Sep 2014 11:14:26 -0700

You're using "hadoopConf", a Configuration object, in your closure.
That type is not serializable.


You can use " -Dsun.io.serialization.extendedDebugInfo=true" to debug
serialization issues.

On Wed, Sep 10, 2014 at 8:23 AM, Sarath Chandra
<sarathchandra.jos...@algofusiontech.com> wrote:
> Thanks Sean.
> Please find attached my code. Let me know your suggestions/ideas.
>
> Regards,
> Sarath
>
> On Wed, Sep 10, 2014 at 8:05 PM, Sean Owen <so...@cloudera.com> wrote:
>>
>> You mention that you are creating a UserGroupInformation inside your
>> function, but something is still serializing it. You should show your
>> code since it may not be doing what you think.
>>
>> If you instantiate an object, it happens every time your function is
>> called. map() is called once per data element; mapPartitions() once
>> per partition. It depends.
>>
>> On Wed, Sep 10, 2014 at 3:25 PM, Sarath Chandra
>> <sarathchandra.jos...@algofusiontech.com> wrote:
>> > Hi Sean,
>> >
>> > The solution of instantiating the non-serializable class inside the map
>> > is
>> > working fine, but I hit a road block. The solution is not working for
>> > singleton classes like UserGroupInformation.
>> >
>> > In my logic as part of processing a HDFS file, I need to refer to some
>> > reference files which are again available in HDFS. So inside the map
>> > method
>> > I'm trying to instantiate UserGroupInformation to get an instance of
>> > FileSystem. Then using this FileSystem instance I read those reference
>> > files
>> > and use that data in my processing logic.
>> >
>> > This is throwing task not serializable exceptions for
>> > 'UserGroupInformation'
>> > and 'FileSystem' classes. I also tried using 'SparkHadoopUtil' instead
>> > of
>> > 'UserGroupInformation'. But it didn't resolve the issue.
>> >
>> > Request you provide some pointers in this regard.
>> >
>> > Also I have a query - when we instantiate a class inside map method,
>> > does it
>> > create a new instance for every RDD it is processing?
>> >
>> > Thanks & Regards,
>> > Sarath
>> >
>> > On Sat, Sep 6, 2014 at 4:32 PM, Sean Owen <so...@cloudera.com> wrote:
>> >>
>> >> I disagree that the generally right change is to try to make the
>> >> classes serializable. Usually, classes that are not serializable are
>> >> not supposed to be serialized. You're using them in a way that's
>> >> causing them to be serialized, and that's probably not desired.
>> >>
>> >> For example, this is wrong:
>> >>
>> >> val foo: SomeUnserializableManagerClass = ...
>> >> rdd.map(d => foo.bar(d))
>> >>
>> >> This is right:
>> >>
>> >> rdd.map { d =>
>> >>   val foo: SomeUnserializableManagerClass = ...
>> >>   foo.bar(d)
>> >> }
>> >>
>> >> In the first instance, you create the object on the driver and try to
>> >> serialize and copy it to workers. In the second, you're creating
>> >> SomeUnserializableManagerClass in the function and therefore on the
>> >> worker.
>> >>
>> >> mapPartitions is better if this creation is expensive.
>> >>
>> >> On Fri, Sep 5, 2014 at 3:06 PM, Sarath Chandra
>> >> <sarathchandra.jos...@algofusiontech.com> wrote:
>> >> > Hi,
>> >> >
>> >> > I'm trying to migrate a map-reduce program to work with spark. I
>> >> > migrated
>> >> > the program from Java to Scala. The map-reduce program basically
>> >> > loads a
>> >> > HDFS file and for each line in the file it applies several
>> >> > transformation
>> >> > functions available in various external libraries.
>> >> >
>> >> > When I execute this over spark, it is throwing me "Task not
>> >> > serializable"
>> >> > exceptions for each and every class being used from these from
>> >> > external
>> >> > libraries. I included serialization to few classes which are in my
>> >> > scope,
>> >> > but there there are several other classes which are out of my scope
>> >> > like
>> >> > org.apache.hadoop.io.Text.
>> >> >
>> >> > How to overcome these exceptions?
>> >> >
>> >> > ~Sarath.
>> >
>> >
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org



-- 
Marcelo

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Task not serializable

Reply via email to