Hello Spark fellows :)
I'm a new user of Spark and Scala and have been using both for 6 months without
too many problems.
Here I'm looking for best practices for using non-serializable classes inside
closure. I'm using Spark-0.9.0-incubating here with Hadoop 2.2.
Suppose I am using OpenCSV pars
In your original version, the object is referenced by the function but
it's on the driver, and so has to be serialized. This leads to an
error since it's not serializable. Instead, you want to recreate the
object locally on each of the remote machines.
In your third version you are holding the par
In the third case the object does not get shipped around. Each executor
will create it's own instance. I got bitten by this here:
http://apache-spark-user-list.1001560.n3.nabble.com/Help-with-object-access-from-mapper-simple-question-tt8125.html
On Thu, Sep 4, 2014 at 9:29 AM, Andrianasolo Fanil
eton object to share data within an executor sadly...
Thanks for the input
Fanilo
-Message d'origine-
De : Sean Owen [mailto:so...@cloudera.com]
Envoyé : jeudi 4 septembre 2014 15:36
À : Andrianasolo Fanilo
Cc : user@spark.apache.org
Objet : Re: Object serialisation inside closures
-
> De : Sean Owen [mailto:so...@cloudera.com]
> Envoyé : jeudi 4 septembre 2014 15:36
> À : Andrianasolo Fanilo
> Cc : user@spark.apache.org
> Objet : Re: Object serialisation inside closures
>
> In your original version, the object is referenced by the function but
>