Re: Having lots of FetchFailedException in join

2015-03-03 Thread Jianshi Huang
Sorry that I forgot the subject. And in the driver, I got many FetchFailedException. The error messages are 15/03/03 10:34:32 WARN TaskSetManager: Lost task 31.0 in stage 2.2 (TID 7943, ): FetchFailed(BlockManagerId(86, , 43070), shuffleId=0, mapId=24, reduceId=1220, message= org.apache.s

Re: Having lots of FetchFailedException in join

2015-03-03 Thread Aaron Davidson
"Failed to connect" implies that the executor at that host died, please check its logs as well. On Tue, Mar 3, 2015 at 11:03 AM, Jianshi Huang wrote: > Sorry that I forgot the subject. > > And in the driver, I got many FetchFailedException. The error messages are > > 15/03/03 10:34:32 WARN TaskS

Re: Having lots of FetchFailedException in join

2015-03-03 Thread Jianshi Huang
The failed executor has the following error messages. Any hints? 15/03/03 10:22:41 ERROR TransportRequestHandler: Error while invoking RpcHandler#receive() on RPC id 5711039715419258699 java.io.FileNotFoundException: /hadoop01/scratch/local/usercache/jianshuang/appcache/application_1421268539738_2

Re: Having lots of FetchFailedException in join

2015-03-03 Thread Aaron Davidson
Drat! That doesn't help. Could you scan from the top to see if there were any fatal errors preceding these? Sometimes a OOM will cause this type of issue further down. On Tue, Mar 3, 2015 at 8:16 PM, Jianshi Huang wrote: > The failed executor has the following error messages. Any hints? > > 15/0

Re: Having lots of FetchFailedException in join

2015-03-03 Thread Jianshi Huang
Hmm... ok, previous errors are still block fetch errors. 15/03/03 10:22:40 ERROR RetryingBlockFetcher: Exception while beginning fetch of 11 outstanding blocks java.io.IOException: Failed to connect to host-/:55597 at org.apache.spark.network.client.TransportClientFactory.createCli

Re: Having lots of FetchFailedException in join

2015-03-04 Thread Jianshi Huang
I changed spark.shuffle.blockTransferService to nio and now I'm getting OOM errors, I'm doing a big join operation. 15/03/04 19:04:07 ERROR Executor: Exception in task 107.0 in stage 2.0 (TID 6207) java.lang.OutOfMemoryError: Java heap space at org.apache.spark.util.collection.CompactBuff

Re: Having lots of FetchFailedException in join

2015-03-04 Thread Jianshi Huang
One really interesting is that when I'm using the netty-based spark.shuffle.blockTransferService, there's no OOM error messages (java.lang.OutOfMemoryError: Java heap space). Any idea why it's not here? I'm using Spark 1.2.1. Jianshi On Thu, Mar 5, 2015 at 1:56 PM, Jianshi Huang wrote: > I ch

RE: Having lots of FetchFailedException in join

2015-03-04 Thread Shao, Saisai
: Having lots of FetchFailedException in join One really interesting is that when I'm using the netty-based spark.shuffle.blockTransferService, there's no OOM error messages (java.lang.OutOfMemoryError: Java heap space). Any idea why it's not here? I'm using Spark 1.2.1. Ji

Re: Having lots of FetchFailedException in join

2015-03-04 Thread Jianshi Huang
Huang [mailto:jianshi.hu...@gmail.com] > *Sent:* Thursday, March 5, 2015 2:32 PM > *To:* Aaron Davidson > *Cc:* user > *Subject:* Re: Having lots of FetchFailedException in join > > > > One really interesting is that when I'm using the > netty-based spark.shuffle.blockTr

RE: Having lots of FetchFailedException in join

2015-03-04 Thread Shao, Saisai
com<mailto:jianshi.hu...@gmail.com>] Sent: Thursday, March 5, 2015 2:32 PM To: Aaron Davidson Cc: user Subject: Re: Having lots of FetchFailedException in join One really interesting is that when I'm using the netty-based spark.shuffle.blockTransferService, there's no OOM error messag

Re: Having lots of FetchFailedException in join

2015-03-04 Thread Jianshi Huang
ks > > Jerry > > > > *From:* Jianshi Huang [mailto:jianshi.hu...@gmail.com] > *Sent:* Thursday, March 5, 2015 3:28 PM > *To:* Shao, Saisai > > *Cc:* user > *Subject:* Re: Having lots of FetchFailedException in join > > > > Hi Saisai, > > > >

RE: Having lots of FetchFailedException in join

2015-03-04 Thread Shao, Saisai
Yes, if one key has too many values, there still has a chance to meet the OOM. Thanks Jerry From: Jianshi Huang [mailto:jianshi.hu...@gmail.com] Sent: Thursday, March 5, 2015 3:49 PM To: Shao, Saisai Cc: Cheng, Hao; user Subject: Re: Having lots of FetchFailedException in join I see. I'm

Re: Having lots of FetchFailedException in join

2015-03-04 Thread Jianshi Huang
n how SparkSQL uses this operators. >> >> >> >> CC @hao if he has some thoughts on it. >> >> >> >> Thanks >> >> Jerry >> >> >> >> *From:* Jianshi Huang [mailto:jianshi.hu...@gmail.com] >> *Sent:* Thursday, March 5

Re: Having lots of FetchFailedException in join

2015-03-05 Thread Jianshi Huang
ng [mailto:jianshi.hu...@gmail.com] > *Sent:* Thursday, March 5, 2015 3:55 PM > *To:* Shao, Saisai > *Cc:* Cheng, Hao; user > > *Subject:* Re: Having lots of FetchFailedException in join > > > > There're some skew. > > > > 64 > > 6164 > > 0 >

Re: Having lots of FetchFailedException in join

2015-03-05 Thread Aaron Davidson
gt; *To:* Shao, Saisai > *Cc:* Cheng, Hao; user > *Subject:* Re: Having lots of FetchFailedException in join > > > > Thanks. I was about to submit a ticket for this :) > > > > Also there's a ticket for sort-merge based groupbykey > https://issues.apache.org/jira/bro