Hi, my programming model requires me to generate multiple RDDs for various
datasets across a single run and then run an action on it - E.g.
MyFunc myFunc = ... //It implements VoidFunction
//set some extra variables - all serializable
...
for (JavaRDDString rdd: rddList) {
...
Excuse me - the line inside the loop should read: rdd.foreach(myFunc) - not
sc.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Running-an-action-inside-a-loop-across-multiple-RDDs-java-io-NotSerializableException-tp16580p16581.html
Sent from the Apache
You can first union them into a single RDD and then call |foreach|. In
Scala:
|rddList.reduce(_.union(_)).foreach(myFunc)
|
For the serialization issue, I don’t have any clue unless more code can
be shared.
On 10/16/14 11:39 PM, /soumya/ wrote:
Hi, my programming model requires me to
Sorry - I'll furnish some details below. However, union is not an option for
the business logic I have. The function will generate a specific file based
on a variable passed in as the setter for the function. This variable
changes with each RDD. I annotated the log line where the first run