Logs from the workers? On Wed, Jan 6, 2016 at 1:57 PM, Jeff Jones <jjo...@adaptivebiotech.com> wrote:
> I upgraded our Spark standalone cluster from 1.4.1 to 1.6.0 yesterday. We > are now seeing regular timeouts between two of the workers when making > connections. These workers and the same driver code worked fine running on > 1.4.1 and finished in under a second. Any thoughts on what might have > changed? > > 16/01/06 19:17:58 ERROR RetryingBlockFetcher: Exception while beginning > fetch of 1 outstanding blocks (after 3 retries) > java.io.IOException: Connecting to /10.248.0.218:52104 timed out (120000 > ms) > at > org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:214) > at > org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:167) > at > org.apache.spark.network.netty.NettyBlockTransferService$$anon$1.createAndStart(NettyBlockTransferService.scala:90) > at > org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140) > at > org.apache.spark.network.shuffle.RetryingBlockFetcher.access$200(RetryingBlockFetcher.java:43) > at > org.apache.spark.network.shuffle.RetryingBlockFetcher$1.run(RetryingBlockFetcher.java:170) > at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) > at java.util.concurrent.FutureTask.run(Unknown Source) > at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) > at java.lang.Thread.run(Unknown Source) > 16/01/06 19:17:58 WARN BlockManager: Failed to fetch remote block rdd_74_3 > from BlockManagerId(1, 10.248.0.218, 52104) (failed attempt 1) > java.io.IOException: Connecting to /10.248.0.218:52104 timed out (120000 > ms) > at > org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:214) > at > org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:167) > at > org.apache.spark.network.netty.NettyBlockTransferService$$anon$1.createAndStart(NettyBlockTransferService.scala:90) > at > org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140) > at > org.apache.spark.network.shuffle.RetryingBlockFetcher.access$200(RetryingBlockFetcher.java:43) > at > org.apache.spark.network.shuffle.RetryingBlockFetcher$1.run(RetryingBlockFetcher.java:170) > at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) > at java.util.concurrent.FutureTask.run(Unknown Source) > at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) > at java.lang.Thread.run(Unknown Source) > > > > Thanks, > Jeff > > > > > > This message (and any attachments) is intended only for the designated > recipient(s). It > may contain confidential or proprietary information, or have other > limitations on use as > indicated by the sender. If you are not a designated recipient, you may > not review, use, > copy or distribute this message. If you received this in error, please > notify the sender by > reply e-mail and delete this message. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >