The 4.1 GB table has 3 regions. This means that there would be at least 2 nodes which don't carry its region. Can you split this table into 12 (or more) regions ?
BTW what's the value for spark.yarn.executor.memoryOverhead ? Cheers On Sat, Mar 14, 2015 at 10:52 AM, francexo83 <francex...@gmail.com> wrote: > Hi all, > > > I have the following cluster configurations: > > > - 5 nodes on a cloud environment. > - Hadoop 2.5.0. > - HBase 0.98.6. > - Spark 1.2.0. > - 8 cores and 16 GB of ram on each host. > - 1 NFS disk with 300 IOPS mounted on host 1 and 2. > - 1 NFS disk with 300 IOPS mounted on host 3,4 and 5. > > I tried to run a spark job in cluster mode that computes the left outer > join between two hbase tables. > The first table stores about 4.1 GB of data spread across 3 regions > with Snappy compression. > The second one stores about 1.2 GB of data spread across 22 regions with > Snappy compression. > > I sometimes get executor lost during in the shuffle phase during the last > stage (saveAsHadoopDataset). > > Below my spark conf: > > num-cpu-cores = 20 > memory-per-node = 10G > spark.scheduler.mode = FAIR > spark.scheduler.pool = production > spark.shuffle.spill= true > spark.rdd.compress = true > spark.core.connection.auth.wait.timeout=2000 > spark.sql.shuffle.partitions=100 > spark.default.parallelism=50 > spark.speculation=false > spark.shuffle.spill=true > spark.shuffle.memoryFraction=0.1 > spark.cores.max=30 > spark.driver.memory=10g > > Are the resource to low to handle this kind of operation? > > if yes, could you share with me the right configuration to perform this > kind of task? > > Thank you in advance. > > F. > > > > >