Data is in hdfs, running 2 workers with 1 GB memory
datafile1 is ~9KB and datafile2 is ~216MB. Cant get it to run at all...
Tried various different settings for the number of tasks, all the way from
2 to 1024.
Anyone else seen similar issues.
import org.apache.spark.SparkContext
import
On Fri, Dec 6, 2013 at 1:06 AM, learner1014 all learner1...@gmail.comwrote:
Hi,
Trying to do a join operation on an RDD, my input is pipe delimited data
and there are 2 files.
One file is 24MB and the other file is 285MB.
Setup being used is the single node (server) setup: SPARK_MEM set to 512m