Object serialisation inside closures

2014-09-04 Thread Andrianasolo Fanilo
Hello Spark fellows :) I'm a new user of Spark and Scala and have been using both for 6 months without too many problems. Here I'm looking for best practices for using non-serializable classes inside closure. I'm using Spark-0.9.0-incubating here with Hadoop 2.2. Suppose I am using OpenCSV pars

Re: Object serialisation inside closures

2014-09-04 Thread Sean Owen
In your original version, the object is referenced by the function but it's on the driver, and so has to be serialized. This leads to an error since it's not serializable. Instead, you want to recreate the object locally on each of the remote machines. In your third version you are holding the par

Re: Object serialisation inside closures

2014-09-04 Thread Yana Kadiyska
In the third case the object does not get shipped around. Each executor will create it's own instance. I got bitten by this here: http://apache-spark-user-list.1001560.n3.nabble.com/Help-with-object-access-from-mapper-simple-question-tt8125.html On Thu, Sep 4, 2014 at 9:29 AM, Andrianasolo Fanil

RE: Object serialisation inside closures

2014-09-04 Thread Andrianasolo Fanilo
eton object to share data within an executor sadly... Thanks for the input Fanilo -Message d'origine- De : Sean Owen [mailto:so...@cloudera.com] Envoyé : jeudi 4 septembre 2014 15:36 À : Andrianasolo Fanilo Cc : user@spark.apache.org Objet : Re: Object serialisation inside closures

Re: Object serialisation inside closures

2014-09-04 Thread Mohit Jaggi
- > De : Sean Owen [mailto:so...@cloudera.com] > Envoyé : jeudi 4 septembre 2014 15:36 > À : Andrianasolo Fanilo > Cc : user@spark.apache.org > Objet : Re: Object serialisation inside closures > > In your original version, the object is referenced by the function but >