In the third case the object does not get shipped around. Each executor
will create it's own instance. I got bitten by this here:

http://apache-spark-user-list.1001560.n3.nabble.com/Help-with-object-access-from-mapper-simple-question-tt8125.html


On Thu, Sep 4, 2014 at 9:29 AM, Andrianasolo Fanilo <
fanilo.andrianas...@worldline.com> wrote:

>  Hello Spark fellows J
>
>
>
> I’m a new user of Spark and Scala and have been using both for 6 months
> without too many problems.
>
> Here I’m looking for best practices for using non-serializable classes
> inside closure. I’m using Spark-0.9.0-incubating here with Hadoop 2.2.
>
>
>
> Suppose I am using OpenCSV parser to parse an input file. So inside my
> main :
>
>
>
> val sc = new SparkContext("local[2]", "App")
>
> val heyRDD = sc.textFile("…")
>
>
>
> val csvparser = new CSVParser(';')
>
> val heyMap = heyRDD.map { line =>
>
>       val temp = csvparser.parseLine(line)
>
>       (temp(1), temp(4))
>
> }
>
>
>
>
>
> This gives me a java.io.NotSerializableException:
> au.com.bytecode.opencsv.CSVParser, which seems reasonable.
>
>
>
> From here I could see 3 solutions :
>
> 1/ Extending CSVParser with Serialisable properties, which adds a lot of
> boilerplate code if you ask me
>
> 2/ Using Kryo Serialization (still need to define a serializer)
>
> 3/ Creating an object with an instance of the class I want to use,
> typically :
>
>
>
> object CSVParserPlus {
>
>
>
>   val csvParser = new CSVParser(';')
>
>
>
>   def parse(line: String) = {
>
>     csvParser.parseLine(line)
>
>   }
>
> }
>
>
>
>
>
>     val heyMap = heyRDD.map { line =>
>
>       val temp = CSVParserPlus.parse(line)
>
>       (temp(1), temp(4))
>
>     }
>
>
>
> Third solution works and I don’t get how, so I was wondering how worked
> the closure system inside Spark to be able to serialize an object with a
> non-serializable instance. How does that work ? Does it hinder performance
> ? Is it a good solution ? How do you manage this problem ?
>
>
>
> Any input would be greatly appreciated
>
>
>
> Best regards,
>
> Fanilo
>
> ------------------------------
>
> Ce message et les pièces jointes sont confidentiels et réservés à l'usage
> exclusif de ses destinataires. Il peut également être protégé par le secret
> professionnel. Si vous recevez ce message par erreur, merci d'en avertir
> immédiatement l'expéditeur et de le détruire. L'intégrité du message ne
> pouvant être assurée sur Internet, la responsabilité de Worldline ne pourra
> être recherchée quant au contenu de ce message. Bien que les meilleurs
> efforts soient faits pour maintenir cette transmission exempte de tout
> virus, l'expéditeur ne donne aucune garantie à cet égard et sa
> responsabilité ne saurait être recherchée pour tout dommage résultant d'un
> virus transmis.
>
> This e-mail and the documents attached are confidential and intended
> solely for the addressee; it may also be privileged. If you receive this
> e-mail in error, please notify the sender immediately and destroy it. As
> its integrity cannot be secured on the Internet, the Worldline liability
> cannot be triggered for the message content. Although the sender endeavours
> to maintain a computer virus-free network, the sender does not warrant that
> this transmission is virus-free and will not be liable for any damages
> resulting from any virus transmitted.
>

Reply via email to