In the third case the object does not get shipped around. Each executor will create it's own instance. I got bitten by this here:
http://apache-spark-user-list.1001560.n3.nabble.com/Help-with-object-access-from-mapper-simple-question-tt8125.html On Thu, Sep 4, 2014 at 9:29 AM, Andrianasolo Fanilo < fanilo.andrianas...@worldline.com> wrote: > Hello Spark fellows J > > > > I’m a new user of Spark and Scala and have been using both for 6 months > without too many problems. > > Here I’m looking for best practices for using non-serializable classes > inside closure. I’m using Spark-0.9.0-incubating here with Hadoop 2.2. > > > > Suppose I am using OpenCSV parser to parse an input file. So inside my > main : > > > > val sc = new SparkContext("local[2]", "App") > > val heyRDD = sc.textFile("…") > > > > val csvparser = new CSVParser(';') > > val heyMap = heyRDD.map { line => > > val temp = csvparser.parseLine(line) > > (temp(1), temp(4)) > > } > > > > > > This gives me a java.io.NotSerializableException: > au.com.bytecode.opencsv.CSVParser, which seems reasonable. > > > > From here I could see 3 solutions : > > 1/ Extending CSVParser with Serialisable properties, which adds a lot of > boilerplate code if you ask me > > 2/ Using Kryo Serialization (still need to define a serializer) > > 3/ Creating an object with an instance of the class I want to use, > typically : > > > > object CSVParserPlus { > > > > val csvParser = new CSVParser(';') > > > > def parse(line: String) = { > > csvParser.parseLine(line) > > } > > } > > > > > > val heyMap = heyRDD.map { line => > > val temp = CSVParserPlus.parse(line) > > (temp(1), temp(4)) > > } > > > > Third solution works and I don’t get how, so I was wondering how worked > the closure system inside Spark to be able to serialize an object with a > non-serializable instance. How does that work ? Does it hinder performance > ? Is it a good solution ? How do you manage this problem ? > > > > Any input would be greatly appreciated > > > > Best regards, > > Fanilo > > ------------------------------ > > Ce message et les pièces jointes sont confidentiels et réservés à l'usage > exclusif de ses destinataires. Il peut également être protégé par le secret > professionnel. Si vous recevez ce message par erreur, merci d'en avertir > immédiatement l'expéditeur et de le détruire. L'intégrité du message ne > pouvant être assurée sur Internet, la responsabilité de Worldline ne pourra > être recherchée quant au contenu de ce message. Bien que les meilleurs > efforts soient faits pour maintenir cette transmission exempte de tout > virus, l'expéditeur ne donne aucune garantie à cet égard et sa > responsabilité ne saurait être recherchée pour tout dommage résultant d'un > virus transmis. > > This e-mail and the documents attached are confidential and intended > solely for the addressee; it may also be privileged. If you receive this > e-mail in error, please notify the sender immediately and destroy it. As > its integrity cannot be secured on the Internet, the Worldline liability > cannot be triggered for the message content. Although the sender endeavours > to maintain a computer virus-free network, the sender does not warrant that > this transmission is virus-free and will not be liable for any damages > resulting from any virus transmitted. >