To answer your second question first, you can use the SparkContext format
"local-cluster[2, 1, 512]" (instead of "local[2]"), which would create a
local test cluster with 2 workers, each with 1 core and 512 MB of memory.
This should allow you to accurately test things like serialization.

I don't believe that adding a function-local variable would cause the
function to be unserializable, though. The only concern when shipping
around functions is when they refer to variables *outside* the function's
scope, in which case Spark will automatically ship those variables to all
workers (unless you override this behavior with a broadcast or accumulator
variable<http://spark.incubator.apache.org/docs/0.7.3/scala-programming-guide.html#shared-variables>
).


On Mon, Oct 21, 2013 at 10:30 AM, Shay Seng <s...@1618labs.com> wrote:

> I'm trying to write a unit test to ensure that some functions I rely on
> will always serialize and run correctly on a cluster.
> In one of these functions I've deliberately added a "val x:Int = 1" which
> should prevent this method from being serializable right?
>
> In the test I've done:
>    sc = new SparkContext("local[2]","test")
>    ...
>    val pdata = sc.parallelize(data)
>    val c = pdata.map().collect()
>
> The unit tests still complete with no errors; I'm guessing because spark
> knows that local[2] doesn't require serialization? Is there someway I can
> force spark to run like it would do on a real cluster?
>
>
> tks
> shay
>

Reply via email to