Hi Lee, it's actually not related to threading at all - you would still have the same problem even if you were using a single thread. See this section ( https://spark.apache.org/docs/latest/programming-guide.html#passing-functions-to-spark) of the Spark docs.
On June 5, 2015, at 5:12 PM, Lee McFadden <splee...@gmail.com> wrote: On Fri, Jun 5, 2015 at 2:05 PM Will Briggs <wrbri...@gmail.com> wrote: Your lambda expressions on the RDDs in the SecondRollup class are closing around the context, and Spark has special logic to ensure that all variables in a closure used on an RDD are Serializable - I hate linking to Quora, but there's a good explanation here: http://www.quora.com/What-does-Closure-cleaner-func-mean-in-Spark Ah, I see! So if I broke out the lambda expressions into a method on an object it would prevent this issue. Essentially, "don't use lambda expressions when using threads". Thanks again, I appreciate the help.