Running an action inside a loop across multiple RDDs + java.io.NotSerializableException

2014-10-16 Thread _soumya_
Hi, my programming model requires me to generate multiple RDDs for various datasets across a single run and then run an action on it - E.g. MyFunc myFunc = ... //It implements VoidFunction //set some extra variables - all serializable ... for (JavaRDDString rdd: rddList) { ...

Re: Running an action inside a loop across multiple RDDs + java.io.NotSerializableException

2014-10-16 Thread _soumya_
Excuse me - the line inside the loop should read: rdd.foreach(myFunc) - not sc. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Running-an-action-inside-a-loop-across-multiple-RDDs-java-io-NotSerializableException-tp16580p16581.html Sent from the Apache

Re: Running an action inside a loop across multiple RDDs + java.io.NotSerializableException

2014-10-16 Thread Cheng Lian
You can first union them into a single RDD and then call |foreach|. In Scala: |rddList.reduce(_.union(_)).foreach(myFunc) | For the serialization issue, I don’t have any clue unless more code can be shared. On 10/16/14 11:39 PM, /soumya/ wrote: Hi, my programming model requires me to

Re: Running an action inside a loop across multiple RDDs + java.io.NotSerializableException

2014-10-16 Thread _soumya_
Sorry - I'll furnish some details below. However, union is not an option for the business logic I have. The function will generate a specific file based on a variable passed in as the setter for the function. This variable changes with each RDD. I annotated the log line where the first run