Re: No node was available to execute query error

Bowen Song Fri, 12 Mar 2021 08:02:26 -0800

Sleep-then-retry works is just another indicator that it's likely a GCpause related issue. I'd recommend you to check your Cassandra servers'GC logs first.

Do you know what's the maximum partition size for the doc.fieldcountstable? (Try the "nodetool cfstats doc.fieldcounts" command) I suspectthis table has large partitions, which usually leads to GC issues.

As of your failed executeAsync() insert issue, do you know how manyconcurrent on-the-fly queries do you have? Cassandra driver haslimitations on it, and new executeAsync() calls will fail when the limitis reached.

I'm also a bit concerned about your "significantly" slower inserts.Inserts (excluding "INSERT IF NOT EXISTS") should be very fast inCassandra. How slow are they? Are they always slow like that, or usuallyfast but some are much slower than others? What does the CPU usage &disk IO look like on the Cassandra server? Do you have commitlog on thesame disk as the data? Is it a spinning disk, SATA SSD or NVMe?


BTW, you really shouldn't use SimpleStrategy for production environments.


On 12/03/2021 15:18, Joe Obernberger wrote:

The queries that are failing are:
select fieldvalue, count from doc.ordered_fieldcounts where source=?and fieldname=? limit 10
Created with:
CREATE TABLE doc.ordered_fieldcounts (
    source text,
    fieldname text,
    count bigint,
    fieldvalue text,
    PRIMARY KEY ((source, fieldname), count, fieldvalue)
) WITH CLUSTERING ORDER BY (count DESC, fieldvalue ASC)

and:
select fieldvalue, count from doc.fieldcounts where source=? andfieldname=?
Created with:
CREATE TABLE doc.fieldcounts (
    source text,
    fieldname text,
    fieldvalue text,
    count bigint,
    PRIMARY KEY (source, fieldname, fieldvalue)
)
This really seems like a driver issue. I put retry logic around thecalls and now those queries work. Basically if it throws anexception, I Thread.sleep(500) and then retry. This seems to be acontinuing theme with Cassandra in general. Is this common practice?
After doing this retry logic, an insert statement started failing withan illegal state exception when I retried it (which makes sense). This insert was using session.executeAsync(boundStatement). I changedthat to just execute (instead of async) and now I get no errors, noretries anywhere. The insert is *significantly* slower when runningexecute vs executeAsync. When using executeAsync:
com.datastax.oss.driver.api.core.NoNodeAvailableException: No node wasavailable to execute the query atcom.datastax.oss.driver.api.core.NoNodeAvailableException.copy(NoNodeAvailableException.java:40) atcom.datastax.oss.driver.internal.core.util.concurrent.CompletableFutures.getUninterruptibly(CompletableFutures.java:149) atcom.datastax.oss.driver.internal.core.cql.MultiPageResultSet$RowIterator.maybeMoveToNextPage(MultiPageResultSet.java:99) atcom.datastax.oss.driver.internal.core.cql.MultiPageResultSet$RowIterator.computeNext(MultiPageResultSet.java:91) atcom.datastax.oss.driver.internal.core.cql.MultiPageResultSet$RowIterator.computeNext(MultiPageResultSet.java:79) atcom.datastax.oss.driver.internal.core.util.CountingIterator.tryToComputeNext(CountingIterator.java:91) atcom.datastax.oss.driver.internal.core.util.CountingIterator.hasNext(CountingIterator.java:86) atcom.ngc.helios.fieldanalyzer.FTAProcess.handleOrderedFieldCounts(FTAProcess.java:684) atcom.ngc.helios.fieldanalyzer.FTAProcess.storeResults(FTAProcess.java:214) atcom.ngc.helios.fieldanalyzer.FTAProcess.startProcess(FTAProcess.java:190)
        at com.ngc.helios.fieldanalyzer.Main.main(Main.java:20)
The interesting part here is the the line that is now failing (line684 in FTAProcess) is:
if (itRs.hasNext())
where itRs is an iterator<Row> over a select query from anothertable. I'm iterating over a result set from a select and insertingthose results via executeAsync.
-Joe

On 3/12/2021 9:07 AM, Bowen Song wrote:
Millions rows in a single query? That sounds like a bad idea to me.Your "NoNodeAvailableException" could be caused by stop-the-world GCpauses, and the GC pauses are likely caused by the query itself.
On 12/03/2021 13:39, Joe Obernberger wrote:
Thank you Paul and Erick.  The keyspace is defined like this:
CREATE KEYSPACE doc WITH replication = {'class': 'SimpleStrategy','replication_factor': '3'} AND durable_writes = true;
Would that cause this?
The program that is having the problem selects data, calculatesstuff, and inserts. It works with smaller selects, but when thenumber of rows is in the millions, I start to get this error. Sinceit works with smaller sets, I don't believe it to be a networkerror. All the nodes are definitely up as other processes areworking OK, it's just this one program that fails.
The full stack trace:
Error: com.datastax.oss.driver.api.core.NoNodeAvailableException: Nonode was available to execute the querycom.datastax.oss.driver.api.core.NoNodeAvailableException: No nodewas available to execute the query atcom.datastax.oss.driver.api.core.NoNodeAvailableException.copy(NoNodeAvailableException.java:40) atcom.datastax.oss.driver.internal.core.util.concurrent.CompletableFutures.getUninterruptibly(CompletableFutures.java:149) atcom.datastax.oss.driver.internal.core.cql.CqlRequestSyncProcessor.process(CqlRequestSyncProcessor.java:53) atcom.datastax.oss.driver.internal.core.cql.CqlRequestSyncProcessor.process(CqlRequestSyncProcessor.java:30) atcom.datastax.oss.driver.internal.core.session.DefaultSession.execute(DefaultSession.java:230) atcom.datastax.oss.driver.api.core.cql.SyncCqlSession.execute(SyncCqlSession.java:54) atcom.abc.xxxx.fieldanalyzer.FTAProcess.udpateCassandraFTAMetrics(FTAProcess.java:275) atcom.abc.xxxx.fieldanalyzer.FTAProcess.storeResults(FTAProcess.java:216) atcom.abc.xxxx.fieldanalyzer.FTAProcess.startProcess(FTAProcess.java:199)
        at com.abc.xxxx.fieldanalyzer.Main.main(Main.java:20)

FTAProcess like 275 is:
ResultSet rs = session.execute(getFieldCounts.bind().setString(0,rb.getSource()).setString(1, rb.getFieldName()));
-Joe

On 3/12/2021 8:30 AM, Paul Chandler wrote:
Hi Joe
This could also be caused by the replication factor of thekeyspace, if you have NetworkTopologyStrategy and it doesn’t list areplication factor for the datacenter datacenter1 then you will getthis error message too.
Paul
On 12 Mar 2021, at 13:07, Erick Ramirez<erick.rami...@datastax.com <mailto:erick.rami...@datastax.com>>wrote:
Does it get returned by the driver every single time? TheNoNodeAvailableExceptiongets thrown when (1) all nodes are down,or (2) all the contact points are invalid from the driver'sperspective.
Is it possible there's no route/connectivity from your appserver(s) to the 172.16.x.xnetwork? If you post the full errormessage + full stacktrace, it might provide clues. Cheers!
<http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient>Virus-free. www.avg.com<http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient>
<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

Re: No node was available to execute query error

Reply via email to