To answer your second question first, you can use the SparkContext format "local-cluster[2, 1, 512]" (instead of "local[2]"), which would create a local test cluster with 2 workers, each with 1 core and 512 MB of memory. This should allow you to accurately test things like serialization.
I don't believe that adding a function-local variable would cause the function to be unserializable, though. The only concern when shipping around functions is when they refer to variables *outside* the function's scope, in which case Spark will automatically ship those variables to all workers (unless you override this behavior with a broadcast or accumulator variable<http://spark.incubator.apache.org/docs/0.7.3/scala-programming-guide.html#shared-variables> ). On Mon, Oct 21, 2013 at 10:30 AM, Shay Seng <s...@1618labs.com> wrote: > I'm trying to write a unit test to ensure that some functions I rely on > will always serialize and run correctly on a cluster. > In one of these functions I've deliberately added a "val x:Int = 1" which > should prevent this method from being serializable right? > > In the test I've done: > sc = new SparkContext("local[2]","test") > ... > val pdata = sc.parallelize(data) > val c = pdata.map().collect() > > The unit tests still complete with no errors; I'm guessing because spark > knows that local[2] doesn't require serialization? Is there someway I can > force spark to run like it would do on a real cluster? > > > tks > shay >