If you can reproduce, then i think you can open up a jira for this. Thanks Best Regards
On Fri, Oct 23, 2015 at 1:37 PM, Eugen Cepoi <cepoi.eu...@gmail.com> wrote: > When fixing the port to the same values as in the stack trace it works > too. The network config of the slaves seems correct. > > Thanks, > Eugen > > 2015-10-23 8:30 GMT+02:00 Akhil Das <ak...@sigmoidanalytics.com>: > >> Mostly a network issue, you need to check your network configuration from >> the aws console and make sure the ports are accessible within the cluster. >> >> Thanks >> Best Regards >> >> On Thu, Oct 22, 2015 at 8:53 PM, Eugen Cepoi <cepoi.eu...@gmail.com> >> wrote: >> >>> Huh indeed this worked, thanks. Do you know why this happens, is that >>> some known issue? >>> >>> Thanks, >>> Eugen >>> >>> 2015-10-22 19:08 GMT+07:00 Akhil Das <ak...@sigmoidanalytics.com>: >>> >>>> Can you try fixing spark.blockManager.port to specific port and see if >>>> the issue exists? >>>> >>>> Thanks >>>> Best Regards >>>> >>>> On Mon, Oct 19, 2015 at 6:21 PM, Eugen Cepoi <cepoi.eu...@gmail.com> >>>> wrote: >>>> >>>>> Hi, >>>>> >>>>> I am running spark streaming 1.4.1 on EMR (AMI 3.9) over YARN. >>>>> The job is reading data from Kinesis and the batch size is of 30s (I >>>>> used the same value for the kinesis checkpointing). >>>>> In the executor logs I can see every 5 seconds a sequence of >>>>> stacktraces indicating that the block replication failed. I am using the >>>>> default storage level MEMORY_AND_DISK_SER_2. >>>>> WAL is not enabled nor checkpointing (the checkpoint dir is configured >>>>> for the spark context but not for the streaming context). >>>>> >>>>> Here is an example of those logs for ip-10-63-160-18. They occur in >>>>> every executor while trying to replicate to any other executor. >>>>> >>>>> >>>>> 15/10/19 03:11:55 INFO nio.SendingConnection: Initiating connection to >>>>> [ip-10-63-160-18.ec2.internal/10.63.160.18:50929] >>>>> 15/10/19 03:11:55 WARN nio.SendingConnection: Error finishing connection >>>>> to ip-10-63-160-18.ec2.internal/10.63.160.18:50929 >>>>> java.net.ConnectException: Connection refused >>>>> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) >>>>> at >>>>> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739) >>>>> at >>>>> org.apache.spark.network.nio.SendingConnection.finishConnect(Connection.scala:344) >>>>> at >>>>> org.apache.spark.network.nio.ConnectionManager$$anon$10.run(ConnectionManager.scala:292) >>>>> at >>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>>>> at >>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>>>> at java.lang.Thread.run(Thread.java:745) >>>>> 15/10/19 03:11:55 ERROR nio.ConnectionManager: Exception while sending >>>>> message. >>>>> java.net.ConnectException: Connection refused >>>>> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) >>>>> at >>>>> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739) >>>>> at >>>>> org.apache.spark.network.nio.SendingConnection.finishConnect(Connection.scala:344) >>>>> at >>>>> org.apache.spark.network.nio.ConnectionManager$$anon$10.run(ConnectionManager.scala:292) >>>>> at >>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>>>> at >>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>>>> at java.lang.Thread.run(Thread.java:745) >>>>> 15/10/19 03:11:55 INFO nio.ConnectionManager: Notifying >>>>> ConnectionManagerId(ip-10-63-160-18.ec2.internal,50929) >>>>> 15/10/19 03:11:55 INFO nio.ConnectionManager: Handling connection error >>>>> on connection to ConnectionManagerId(ip-10-63-160-18.ec2.internal,50929) >>>>> 15/10/19 03:11:55 WARN storage.BlockManager: Failed to replicate >>>>> input-1-1445242310000 to BlockManagerId(3, ip-10-159-151-22.ec2.internal, >>>>> 50929), failure #0 >>>>> java.net.ConnectException: Connection refused >>>>> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) >>>>> at >>>>> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739) >>>>> at >>>>> org.apache.spark.network.nio.SendingConnection.finishConnect(Connection.scala:344) >>>>> at >>>>> org.apache.spark.network.nio.ConnectionManager$$anon$10.run(ConnectionManager.scala:292) >>>>> at >>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>>>> at >>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>>>> at java.lang.Thread.run(Thread.java:745) >>>>> 15/10/19 03:11:55 INFO nio.ConnectionManager: Removing SendingConnection >>>>> to ConnectionManagerId(ip-10-63-160-18.ec2.internal,50929) >>>>> 15/10/19 03:11:55 INFO nio.SendingConnection: Initiating connection to >>>>> [ip-10-63-160-18.ec2.internal/10.63.160.18:39506] >>>>> 15/10/19 03:11:55 WARN nio.SendingConnection: Error finishing connection >>>>> to ip-10-63-160-18.ec2.internal/10.63.160.18:39506 >>>>> java.net.ConnectException: Connection refused >>>>> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) >>>>> at >>>>> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739) >>>>> at >>>>> org.apache.spark.network.nio.SendingConnection.finishConnect(Connection.scala:344) >>>>> at >>>>> org.apache.spark.network.nio.ConnectionManager$$anon$10.run(ConnectionManager.scala:292) >>>>> at >>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>>>> at >>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>>>> at java.lang.Thread.run(Thread.java:745) >>>>> 15/10/19 03:11:55 ERROR nio.ConnectionManager: Exception while sending >>>>> message. >>>>> java.net.ConnectException: Connection refused >>>>> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) >>>>> at >>>>> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739) >>>>> at >>>>> org.apache.spark.network.nio.SendingConnection.finishConnect(Connection.scala:344) >>>>> at >>>>> org.apache.spark.network.nio.ConnectionManager$$anon$10.run(ConnectionManager.scala:292) >>>>> at >>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>>>> at >>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>>>> at java.lang.Thread.run(Thread.java:745) >>>>> 15/10/19 03:11:55 INFO nio.ConnectionManager: Notifying >>>>> ConnectionManagerId(ip-10-63-160-18.ec2.internal,39506) >>>>> 15/10/19 03:11:55 INFO nio.ConnectionManager: Handling connection error >>>>> on connection to ConnectionManagerId(ip-10-63-160-18.ec2.internal,39506) >>>>> 15/10/19 03:11:55 INFO nio.ConnectionManager: Removing SendingConnection >>>>> to ConnectionManagerId(ip-10-63-160-18.ec2.internal,39506) >>>>> 15/10/19 03:11:55 WARN storage.BlockManager: Failed to replicate >>>>> input-1-1445242310000 to BlockManagerId(2, ip-10-141-12-91.ec2.internal, >>>>> 39506), failure #1 >>>>> java.net.ConnectException: Connection refused >>>>> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) >>>>> at >>>>> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739) >>>>> at >>>>> org.apache.spark.network.nio.SendingConnection.finishConnect(Connection.scala:344) >>>>> at >>>>> org.apache.spark.network.nio.ConnectionManager$$anon$10.run(ConnectionManager.scala:292) >>>>> at >>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>>>> at >>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>>>> at java.lang.Thread.run(Thread.java:745) >>>>> 15/10/19 03:11:55 WARN storage.BlockManager: Block input-1-1445242310000 >>>>> replicated to only 0 peer(s) instead of 1 peers >>>>> 15/10/19 03:11:55 INFO receiver.BlockGenerator: Pushed block >>>>> input-1-1445242310000 >>>>> >>>>> >>>>> >>>>> Thanks, >>>>> Eugen >>>>> >>>> >>>> >>> >> >