"Failed to connect" implies that the executor at that host died, please
check its logs as well.

On Tue, Mar 3, 2015 at 11:03 AM, Jianshi Huang <jianshi.hu...@gmail.com>
wrote:

> Sorry that I forgot the subject.
>
> And in the driver, I got many FetchFailedException. The error messages are
>
> 15/03/03 10:34:32 WARN TaskSetManager: Lost task 31.0 in stage 2.2 (TID
> 7943, xxxx): FetchFailed(BlockManagerId(86, xxxx, 43070), shuffleId=0,
> mapId=24, reduceId=1220, message=
> org.apache.spark.shuffle.FetchFailedException: Failed to connect to
> xxxx/xxxx:43070
>         at
> org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$.org$apache$spark$shuffle$hash$BlockStoreShuffleFetcher$$unpackBlock$1(BlockStoreShuffleFetcher.scala:67)
>         at
> org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:83)
>         at
> org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:83)
>
>
> Jianshi
>
> On Wed, Mar 4, 2015 at 2:55 AM, Jianshi Huang <jianshi.hu...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I got this error message:
>>
>> 15/03/03 10:22:41 ERROR OneForOneBlockFetcher: Failed while starting
>> block fetches
>> java.lang.RuntimeException: java.io.FileNotFoundException:
>> /hadoop01/scratch/local/usercache/jianshuang/appcache/application_1421268539738_202330/spark-local-20150303100549-fc3b/02/shuffle_0_1458_0.index
>> (No such file or directory)
>>         at java.io.FileInputStream.open(Native Method)
>>         at java.io.FileInputStream.<init>(FileInputStream.java:146)
>>         at
>> org.apache.spark.shuffle.IndexShuffleBlockManager.getBlockData(IndexShuffleBlockManager.scala:109)
>>         at
>> org.apache.spark.storage.BlockManager.getBlockData(BlockManager.scala:305)
>>         at
>> org.apache.spark.network.netty.NettyBlockRpcServer$$anonfun$2.apply(NettyBlockRpcServer.scala:57)
>>         at
>> org.apache.spark.network.netty.NettyBlockRpcServer$$anonfun$2.apply(NettyBlockRpcServer.scala:57)
>>         at
>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>>         at
>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>>         at
>> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>>         at
>> scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
>>         at
>> scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>>         at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
>>         at
>> org.apache.spark.network.netty.NettyBlockRpcServer.receive(NettyBlockRpcServer.scala:57)
>>
>>
>> And then for the same index file and executor, I got the following errors
>> multiple times
>>
>> 15/03/03 10:22:41 ERROR ShuffleBlockFetcherIterator: Failed to get
>> block(s) from host-xxxx:39534
>> java.lang.RuntimeException: java.io.FileNotFoundException:
>> /hadoop01/scratch/local/usercache/jianshuang/appcache/application_1421268539738_202330/spark-local-20150303100549-fc3b/02/shuffle_0_1458_0.index
>> (No such file or directory)
>>
>> 15/03/03 10:22:41 ERROR RetryingBlockFetcher: Failed to fetch block
>> shuffle_0_13_1228, and will not retry (0 retries)
>> java.lang.RuntimeException: java.io.FileNotFoundException:
>> /hadoop01/scratch/local/usercache/jianshuang/appcache/application_1421268539738_202330/spark-local-20150303100549-fc3b/02/shuffle_0_1458_0.index
>> (No such file or directory)
>>
>> ...
>> Caused by: java.net.ConnectException: Connection refused: host-xxxx....
>>
>>
>> What's the problem?
>>
>> BTW, I'm using Spark 1.2.1-SNAPSHOT I built around Dec. 20. Is there any
>> bug fixes related to shuffle block fetching or index files after that?
>>
>>
>> Thanks,
>> --
>> Jianshi Huang
>>
>> LinkedIn: jianshi
>> Twitter: @jshuang
>> Github & Blog: http://huangjs.github.com/
>>
>
>
>
> --
> Jianshi Huang
>
> LinkedIn: jianshi
> Twitter: @jshuang
> Github & Blog: http://huangjs.github.com/
>

Reply via email to