We are using Spark-cassandra-connector to connect to Cassandra for aggregation jobs for writes with batch size of 1000. We are seeing exception as All host(s) tried for query failed (tried: 00.00.00.00:9042 (com.datastax.driver.core.exceptions.DriverException: Timeout while trying to acquire available connection (you may want to increase the driver number of per-host connections)))
We have a cluster of 6 nodes in one data center.And Spark Cassandra connector is always connecting to only one node. We saw live threads on Cassandra cluster whenever thread counter (tc) is close to peaktc we are seeing the above exceptions. In Cassandra thread dump we saw lot of Parked threads. "SharedPool-Worker-131" #1800 daemon prio=5 os_prio=0 tid=0x00007f2d35539ab0 nid=0xe6d waiting on condition [0x00007f2d23e86000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304) at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:85) at java.lang.Thread.run(Thread.java:745) "SharedPool-Worker-129" #1791 daemon prio=5 os_prio=0 tid=0x00007f2d346d24f0 nid=0xe6c waiting on condition [0x00007f2d23ec7000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304) at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:85) at java.lang.Thread.run(Thread.java:745) "SharedPool-Worker-132" #1799 daemon prio=5 os_prio=0 tid=0x00007f2d343b0230 nid=0xe6b waiting on condition [0x00007f2c82fef000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304) at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:85) at java.lang.Thread.run(Thread.java:745) "SharedPool-Worker-133" #1798 daemon prio=5 os_prio=0 tid=0x00007f2d3567de90 nid=0xe6a waiting on condition [0x00007f2c8c661000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304) at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:85) at java.lang.Thread.run(Thread.java:745) I looked at SEPWorker.run method and I observed the below code is getting executed. // if stop was signalled, go to sleep (don't try self-assign; being put to sleep is rare, so let's obey it // whenever we receive it - though we don't apply this constraint to producers, who may reschedule us before // we go to sleep) if (stop()) while (isStopped()) LockSupport.park(); [http://wwwin.cisco.com/c/dam/cec/organizations/gmcc/services-tools/signaturetool/images/logo/logo_gradient.png] Harika Vangapelli Engineer - IT hvang...@cisco.com<mailto:hvang...@cisco.com> Tel: Cisco Systems, Inc. United States cisco.com [http://www.cisco.com/assets/swa/img/thinkbeforeyouprint.gif]Think before you print. This email may contain confidential and privileged material for the sole use of the intended recipient. Any review, use, distribution or disclosure by others is strictly prohibited. If you are not the intended recipient (or authorized to receive for the recipient), please contact the sender by reply email and delete all copies of this message. Please click here<http://www.cisco.com/web/about/doing_business/legal/cri/index.html> for Company Registration Information.