Spark variable init problem

Han JU Wed, 07 Aug 2013 06:48:32 -0700

Hi,

I'm just putting my hands on Spark and I wrote a simple job in scala.
It sketches like:


val TAB = "\t"

val support = 2

val sc = new SparkContext(...)

val raw = sc.textFile(...)

val filtered = raw.map(

  line => {

    val lineSplit = line.split(TAB) // TAB is null and exception is thrown
during the run
    ...

  }).filter( p => p._2 >= support) // support here is 0 during the run

...

I run the sbt-assembly jar like "java -cp ..." on a standalone cluster, I
found out that when referenced in the RDD transformation, the 2 values, TAB
and support, are set to their default values. So TAB is null, and support
is 0 and no longer "\t" and 2 as they are initialized above.

If the same jar is run locally (MASTER is local or local[k] instead of
spark://...) on the same input, it runs perfectly. The code also runs well
in spark-shell on cluster.

For the jar to run correctly on cluster, I have to hard code the string
literal and the number in the RDD transformation part.

It really seems to me a weird bug, maybe it has something to do with the
sbt-assembly jar compilation? Some suggestions?

Thanks.

I'm using spark version 0.7.3 and scala 2.9.3.

-- 
*JU Han*

Software Engineer Intern @ KXEN Inc.
UTC   -  Université de Technologie de Compiègne
*     **GI06 - Fouille de Données et Décisionnel*

+33 0619608888

Spark variable init problem

Reply via email to