Can you try fixing spark.blockManager.port to specific port and see if the issue exists?
Thanks Best Regards On Mon, Oct 19, 2015 at 6:21 PM, Eugen Cepoi <cepoi.eu...@gmail.com> wrote: > Hi, > > I am running spark streaming 1.4.1 on EMR (AMI 3.9) over YARN. > The job is reading data from Kinesis and the batch size is of 30s (I used > the same value for the kinesis checkpointing). > In the executor logs I can see every 5 seconds a sequence of stacktraces > indicating that the block replication failed. I am using the default > storage level MEMORY_AND_DISK_SER_2. > WAL is not enabled nor checkpointing (the checkpoint dir is configured for > the spark context but not for the streaming context). > > Here is an example of those logs for ip-10-63-160-18. They occur in every > executor while trying to replicate to any other executor. > > > 15/10/19 03:11:55 INFO nio.SendingConnection: Initiating connection to > [ip-10-63-160-18.ec2.internal/10.63.160.18:50929] > 15/10/19 03:11:55 WARN nio.SendingConnection: Error finishing connection to > ip-10-63-160-18.ec2.internal/10.63.160.18:50929 > java.net.ConnectException: Connection refused > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739) > at > org.apache.spark.network.nio.SendingConnection.finishConnect(Connection.scala:344) > at > org.apache.spark.network.nio.ConnectionManager$$anon$10.run(ConnectionManager.scala:292) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > 15/10/19 03:11:55 ERROR nio.ConnectionManager: Exception while sending > message. > java.net.ConnectException: Connection refused > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739) > at > org.apache.spark.network.nio.SendingConnection.finishConnect(Connection.scala:344) > at > org.apache.spark.network.nio.ConnectionManager$$anon$10.run(ConnectionManager.scala:292) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > 15/10/19 03:11:55 INFO nio.ConnectionManager: Notifying > ConnectionManagerId(ip-10-63-160-18.ec2.internal,50929) > 15/10/19 03:11:55 INFO nio.ConnectionManager: Handling connection error on > connection to ConnectionManagerId(ip-10-63-160-18.ec2.internal,50929) > 15/10/19 03:11:55 WARN storage.BlockManager: Failed to replicate > input-1-1445242310000 to BlockManagerId(3, ip-10-159-151-22.ec2.internal, > 50929), failure #0 > java.net.ConnectException: Connection refused > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739) > at > org.apache.spark.network.nio.SendingConnection.finishConnect(Connection.scala:344) > at > org.apache.spark.network.nio.ConnectionManager$$anon$10.run(ConnectionManager.scala:292) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > 15/10/19 03:11:55 INFO nio.ConnectionManager: Removing SendingConnection to > ConnectionManagerId(ip-10-63-160-18.ec2.internal,50929) > 15/10/19 03:11:55 INFO nio.SendingConnection: Initiating connection to > [ip-10-63-160-18.ec2.internal/10.63.160.18:39506] > 15/10/19 03:11:55 WARN nio.SendingConnection: Error finishing connection to > ip-10-63-160-18.ec2.internal/10.63.160.18:39506 > java.net.ConnectException: Connection refused > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739) > at > org.apache.spark.network.nio.SendingConnection.finishConnect(Connection.scala:344) > at > org.apache.spark.network.nio.ConnectionManager$$anon$10.run(ConnectionManager.scala:292) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > 15/10/19 03:11:55 ERROR nio.ConnectionManager: Exception while sending > message. > java.net.ConnectException: Connection refused > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739) > at > org.apache.spark.network.nio.SendingConnection.finishConnect(Connection.scala:344) > at > org.apache.spark.network.nio.ConnectionManager$$anon$10.run(ConnectionManager.scala:292) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > 15/10/19 03:11:55 INFO nio.ConnectionManager: Notifying > ConnectionManagerId(ip-10-63-160-18.ec2.internal,39506) > 15/10/19 03:11:55 INFO nio.ConnectionManager: Handling connection error on > connection to ConnectionManagerId(ip-10-63-160-18.ec2.internal,39506) > 15/10/19 03:11:55 INFO nio.ConnectionManager: Removing SendingConnection to > ConnectionManagerId(ip-10-63-160-18.ec2.internal,39506) > 15/10/19 03:11:55 WARN storage.BlockManager: Failed to replicate > input-1-1445242310000 to BlockManagerId(2, ip-10-141-12-91.ec2.internal, > 39506), failure #1 > java.net.ConnectException: Connection refused > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739) > at > org.apache.spark.network.nio.SendingConnection.finishConnect(Connection.scala:344) > at > org.apache.spark.network.nio.ConnectionManager$$anon$10.run(ConnectionManager.scala:292) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > 15/10/19 03:11:55 WARN storage.BlockManager: Block input-1-1445242310000 > replicated to only 0 peer(s) instead of 1 peers > 15/10/19 03:11:55 INFO receiver.BlockGenerator: Pushed block > input-1-1445242310000 > > > > Thanks, > Eugen >