Okay upped the memory SPARK_MEM to 2048m for the workers. I still left the master at 512m. Similar issues Here is what i see in the stdout logs:
792.187: [Full GC 792.187: [CMS: 963391K->963391K(963392K), 2.5459950 secs] 1039003K->1025065K(1040064K), [CMS Perm : 46331K->46331K(77368K)], 2.5460730 secs] [Times: user=2.54 sys=0.00, real=2.55 secs] 794.735: [GC [1 CMS-initial-mark: 963391K(963392K)] 1027917K(1040064K), 0.0553670 secs] [Times: user=0.06 sys=0.00, real=0.05 secs] 794.791: [CMS-concurrent-mark-start] 794.799: [Full GC 794.799: [CMS796.060: [CMS-concurrent-mark: 1.269/1.269 secs] [Times: user=1.27 sys=0.00, real=1.27 secs] (concurrent mode failure): 963391K->963373K(963392K), 3.8255090 secs] 1040063K->1025223K(1040064K), [CMS Perm : 46331K->46065K(77368K)], 3.8255710 secs] [Times: user=3.80 sys=0.00, real=3.83 secs] 798.642: [Full GC 798.642: [CMS: 963373K->963391K(963392K), 2.5939450 secs] 1039469K->1028596K(1040064K), [CMS Perm : 46066K->46066K(77368K)], 2.5940410 secs] [Times: user=2.59 sys=0.00, real=2.60 secs] 801.236: [GC [1 CMS-initial-mark: 963391K(963392K)] 1030753K(1040064K), 0.0603550 secs] [Times: user=0.06 sys=0.00, real=0.06 secs] 801.297: [CMS-concurrent-mark-start] 801.310: [Full GC 801.310: [CMS802.543: [CMS-concurrent-mark: 1.244/1.246 secs] [Times: user=1.26 sys=0.00, real=1.24 secs] (concurrent mode failure): 963391K->963391K(963392K), 3.8241010 secs] 1039208K->1030163K(1040064K), [CMS Perm : 46066K->46066K(77368K)], 3.8241680 secs] [Times: user=3.80 sys=0.00, real=3.83 secs] 805.143: [Full GC 805.143: [CMS: 963391K->963391K(963392K), 2.6232410 secs] 1038565K->1033504K(1040064K), [CMS Perm : 46066K->46066K(77368K)], 2.6233060 secs] [Times: user=2.61 sys=0.00, real=2.63 secs] 807.767: [GC [1 CMS-initial-mark: 963391K(963392K)] 1035552K(1040064K), 0.0616310 secs] [Times: user=0.06 sys=0.00, real=0.06 secs] 807.829: [CMS-concurrent-mark-start] 807.833: [Full GC 807.833: [CMS809.079: [CMS-concurrent-mark: 1.250/1.250 secs] [Times: user=1.25 sys=0.00, real=1.25 secs] (concurrent mode failure): 963391K->963392K(963392K), 3.8759250 secs] 1040063K->1035642K(1040064K), [CMS Perm : 46066K->46066K(77368K)], 3.8759870 secs] [Times: user=3.86 sys=0.00, real=3.88 secs] Any pointers would be helpful. On Tue, Dec 10, 2013 at 6:08 PM, Patrick Wendell <pwend...@gmail.com> wrote: > Spark probably needs more than 1GB of heap space to function > correctly. What happens if you give the workers more memory? > > - Patrick > > On Tue, Dec 10, 2013 at 2:42 PM, learner1014 all <learner1...@gmail.com> > wrote: > > > > Data is in hdfs, running 2 workers with 1 GB memory > > datafile1 is ~9KB and datafile2 is ~216MB. Cant get it to run at all... > > Tried various different settings for the number of tasks, all the way > from 2 > > to 1024. > > Anyone else seen similar issues. > > > > > > import org.apache.spark.SparkContext > > import org.apache.spark.SparkContext._ > > import org.apache.spark.storage.StorageLevel > > > > object SimpleApp { > > > > def main (args: Array[String]) { > > > > > > > System.setProperty("spark.local.dir","/spark-0.8.0-incubating-bin-cdh4/tmp"); > > System.setProperty("spark.serializer", > > "org.apache.spark.serializer.KryoSerializer") > > System.setProperty("spark.akka.timeout", "30") //in seconds > > > > System.setProperty("spark.executor.memory","1024m") > > System.setProperty("spark.akka.frameSize", "2000") //in MB > > System.setProperty("spark.akka.threads","8") > > > > val dataFile1 = "hdfs://dev01:8020/user/sa/datafile1" > > val dataFile2 = "hdfs://dev01:8020/user/sa/datafile2" > > val sc = new SparkContext("spark://dev01:7077", "Simple App", > > "/spark-0.8.0-incubating-bin-cdh4", > > List("target/scala-2.9.3/simple-project_2.9.3-1.0.jar")) > > > > val data10 = sc.textFile(dataFile1, 1024) > > val data11 = data10.map(x => x.split("|")) > > val data12 = data11.map( x => (x(1).toInt -> x) ) > > > > val data20 = sc.textFile(dataFile2, 1024) > > val data21 = data20.map(x => x.split("|")) > > val data22 = data21.map(x => (x(1).toInt -> x)) > > > > val data3 = data12.join(data22, 1024) > > //val data4 = data3.distinct(4) > > //val numAs = data10.count() > > //val numBs = data20.count() > > //val numCs = data3.count() > > //val numDs = data4.count() > > println("Total records after join is %s".format( data3.count())) > > > > Thanks, >