Hello,
Recently faced a strange problem. I was running a job on my laptop
with deploy mode as client and context as local[*]. In between I
lost connection to my router, and when I got back the connection,
the laptop was assigned a different internal IP address. The job
then failed with the following exception:
20/08/17 12:16:28 ERROR shuffle.RetryingBlockFetcher: Exception
while beginning fetch of 1 outstanding blocks
java.io.IOException: Failed to connect to /192.168.2.109:60405
at
org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:245)
at
org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:187)
at
org.apache.spark.network.netty.NettyBlockTransferService$$anon$2.createAndStart(NettyBlockTransferService.scala:114)
at
org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:141)
at
org.apache.spark.network.shuffle.RetryingBlockFetcher.start(RetryingBlockFetcher.java:121)
at
org.apache.spark.network.netty.NettyBlockTransferService.fetchBlocks(NettyBlockTransferService.scala:124)
at
org.apache.spark.network.BlockTransferService.fetchBlockSync(BlockTransferService.scala:120)
at
org.apache.spark.storage.BlockManager.getRemoteBytes(BlockManager.scala:758)
at
org.apache.spark.scheduler.TaskResultGetter$$anon$3.$anonfun$run$1(TaskResultGetter.scala:88)
at
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at
org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1945)
at
org.apache.spark.scheduler.TaskResultGetter$$anon$3.run(TaskResultGetter.scala:63)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by:
io.netty.channel.AbstractChannel$AnnotatedConnectException:
Connection refused: /192.168.2.109:60405
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:714)
at
io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:330)
at
io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334)
at
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:702)
at
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650)
at
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576)
at
io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
at
io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
at
io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
Looks like this is something to do with the change of my local
internal IP address, which is currently 192.168.2.104. In fact I
have the following lines show up when I started my job:
20/08/17 12:39:31 WARN util.Utils: Your hostname, DESKTOP-8N43UKC
resolves to a loopback address: 127.0.1.1; using 192.168.2.109
instead (on interface eth1)
20/08/17 12:39:31 WARN util.Utils: Set SPARK_LOCAL_IP if you need to
bind to another address
Question: is there a way I can avoid the above mentioned exception,
even if the IP address changes in between? Can I use the loopback
address 127.0.0.1 here for SPARK_LOCAL_IP ?
Thanks.
-Samik
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org