Re: No node was available to execute query error

Joe Obernberger Mon, 15 Mar 2021 07:37:55 -0700

Thank you Bowen - I'm redesigning the tables now. When you giveCassandra two parts to the primary key like


create table xyz (uuid text, source text, primary key (source, uuid));
How is the second part of the primary key used to determine partition size?


-Joe

On 3/12/2021 5:27 PM, Bowen Song wrote:

The partition size min/avg/max of 8409008/15096925/25109160 byteslooks fine for the table fieldcounts, but the number of partitions isa bit worrying. Only 3 partitions? Are you expecting the partitionsize (instead of number of partitions) to grow in the future? That canlead to a lots of headaches.
Forget about the fieldcounts table for now, the doc table looks reallybad. It has min/avg/max partition size of 24602/7052951452/63771372175bytes, the partition sizes are severely unevenly distributed, and theover 60GB partition is way too big.
You really need to redesign your table schemas, and avoid creatinglarge or uneven partitions.
On 12/03/2021 18:52, Joe Obernberger wrote:
Thank you very much for helping me out on this! The tablefieldcounts is currently pretty small - 6.4 million rows.
cfstats are:

Total number of tables: 81
----------------
Keyspace : doc
        Read Count: 3713134
        Read Latency: 0.2664131157130338 ms
        Write Count: 47513045
        Write Latency: 1.0725477948634947 ms
        Pending Flushes: 0
                Table: fieldcounts
                SSTable count: 3
                Space used (live): 16010248
                Space used (total): 16010248
                Space used by snapshots (total): 0
                Off heap memory used (total): 4947
SSTable Compression Ratio:0.3994304032360534
                Number of partitions (estimate): 3
                Memtable cell count: 0
                Memtable data size: 0
                Memtable off heap memory used: 0
                Memtable switch count: 0
                Local read count: 379
                Local read latency: NaN ms
                Local write count: 0
                Local write latency: NaN ms
                Pending flushes: 0
                Percent repaired: 100.0
                Bloom filter false positives: 0
                Bloom filter false ratio: 0.00000
                Bloom filter space used: 48
                Bloom filter off heap memory used: 24
                Index summary off heap memory used: 51
Compression metadata off heap memoryused: 4872
                Compacted partition minimum bytes: 8409008
Compacted partition maximum bytes:25109160
                Compacted partition mean bytes: 15096925
Average live cells per slice (lastfive minutes): NaN Maximum live cells per slice (lastfive minutes): 0 Average tombstones per slice (lastfive minutes): NaN Maximum tombstones per slice (lastfive minutes): 0
                Dropped Mutations: 0
Commitlog is on a separate spindle on the 7 node cluster.� All disksare SATA (spinning rust as they say!). This is an R&D platform, butI will switch to NetworkTopologyStrategy. I'm using Prometheus andGrafana to monitor Cassandra and the CPU load is typically 100 to200% on most of the nodes. Disk IO is typically pretty low.
Performance - in general Async is about 10x faster.
ExecuteAsync:
35mSec for 364 rows.
8120mSec for 205001 rows.
14788mSec for 345001 rows.
4117mSec for 86400 rows.

23,330 rows per second on average

Execute:
232mSec for 364 rows.
584869mSec for 1263283 rows
46290mSec for 86400 rows

2,160 rows per second on average
Curious - our largest table (doc) has the following stats - is it notpartitioned well?
Total number of tables: 81
----------------
Keyspace : doc
        Read Count: 3713134
        Read Latency: 0.2664131157130338 ms
        Write Count: 47513045
        Write Latency: 1.0725477948634947 ms
        Pending Flushes: 0
                Table: doc
                SSTable count: 26
                Space used (live): 57124641753
                Space used (total): 57124641753
Space used by snapshots (total):113012646218
                Off heap memory used (total): 27331913
SSTable Compression Ratio:0.2531585373184219
                Number of partitions (estimate): 12
                Memtable cell count: 0
                Memtable data size: 0
                Memtable off heap memory used: 0
                Memtable switch count: 0
                Local read count: 27169
                Local read latency: NaN ms
                Local write count: 0
                Local write latency: NaN ms
                Pending flushes: 0
                Percent repaired: 0.0
                Bloom filter false positives: 0
                Bloom filter false ratio: 0.00000
                Bloom filter space used: 576
                Bloom filter off heap memory used: 368
                Index summary off heap memory used: 425
Compression metadata off heap memoryused: 27331120
                Compacted partition minimum bytes: 24602
Compacted partition maximum bytes:63771372175
                Compacted partition mean bytes: 7052951452
Average live cells per slice (lastfive minutes): NaN Maximum live cells per slice (lastfive minutes): 0 Average tombstones per slice (lastfive minutes): NaN Maximum tombstones per slice (lastfive minutes): 0
                Dropped Mutations: 0

Thank again!

-Joe

On 3/12/2021 11:01 AM, Bowen Song wrote:
Sleep-then-retry works is just another indicator that it's likely aGC pause related issue. I'd recommend you to check your Cassandraservers' GC logs first.
Do you know what's the maximum partition size for thedoc.fieldcounts table? (Try the "nodetool cfstats doc.fieldcounts"command) I suspect this table has large partitions, which usuallyleads to GC issues.
As of your failed executeAsync() insert issue, do you know how manyconcurrent on-the-fly queries do you have? Cassandra driver haslimitations on it, and new executeAsync() calls will fail when thelimit is reached.
I'm also a bit concerned about your "significantly" slower inserts.Inserts (excluding "INSERT IF NOT EXISTS") should be very fast inCassandra. How slow are they? Are they always slow like that, orusually fast but some are much slower than others? What does the CPUusage & disk IO look like on the Cassandra server? Do you havecommitlog on the same disk as the data? Is it a spinning disk, SATASSD or NVMe?
BTW, you really shouldn't use SimpleStrategy for productionenvironments.
On 12/03/2021 15:18, Joe Obernberger wrote:
The queries that are failing are:
select fieldvalue, count from doc.ordered_fieldcounts wheresource=? and fieldname=? limit 10
Created with:
CREATE TABLE doc.ordered_fieldcounts (
    source text,
    fieldname text,
    count bigint,
    fieldvalue text,
    PRIMARY KEY ((source, fieldname), count, fieldvalue)
) WITH CLUSTERING ORDER BY (count DESC, fieldvalue ASC)

and:
select fieldvalue, count from doc.fieldcounts where source=? andfieldname=?
Created with:
CREATE TABLE doc.fieldcounts (
    source text,
    fieldname text,
    fieldvalue text,
    count bigint,
    PRIMARY KEY (source, fieldname, fieldvalue)
)
This really seems like a driver issue. I put retry logic aroundthe calls and now those queries work. Basically if it throws anexception, I Thread.sleep(500) and then retry. This seems to be acontinuing theme with Cassandra in general. Is this common practice?
After doing this retry logic, an insert statement started failingwith an illegal state exception when I retried it (which makessense). This insert was usingsession.executeAsync(boundStatement). I changed that to justexecute (instead of async) and now I get no errors, no retriesanywhere. The insert is *significantly* slower when runningexecute vs executeAsync. When using executeAsync:
com.datastax.oss.driver.api.core.NoNodeAvailableException: No nodewas available to execute the query atcom.datastax.oss.driver.api.core.NoNodeAvailableException.copy(NoNodeAvailableException.java:40) atcom.datastax.oss.driver.internal.core.util.concurrent.CompletableFutures.getUninterruptibly(CompletableFutures.java:149) atcom.datastax.oss.driver.internal.core.cql.MultiPageResultSet$RowIterator.maybeMoveToNextPage(MultiPageResultSet.java:99) atcom.datastax.oss.driver.internal.core.cql.MultiPageResultSet$RowIterator.computeNext(MultiPageResultSet.java:91) atcom.datastax.oss.driver.internal.core.cql.MultiPageResultSet$RowIterator.computeNext(MultiPageResultSet.java:79) atcom.datastax.oss.driver.internal.core.util.CountingIterator.tryToComputeNext(CountingIterator.java:91) atcom.datastax.oss.driver.internal.core.util.CountingIterator.hasNext(CountingIterator.java:86) atcom.ngc.helios.fieldanalyzer.FTAProcess.handleOrderedFieldCounts(FTAProcess.java:684) atcom.ngc.helios.fieldanalyzer.FTAProcess.storeResults(FTAProcess.java:214) atcom.ngc.helios.fieldanalyzer.FTAProcess.startProcess(FTAProcess.java:190)
        at com.ngc.helios.fieldanalyzer.Main.main(Main.java:20)
The interesting part here is the the line that is now failing (line684 in FTAProcess) is:
if (itRs.hasNext())
where itRs is an iterator<Row> over a select query from anothertable. I'm iterating over a result set from a select andinserting those results via executeAsync.
-Joe

On 3/12/2021 9:07 AM, Bowen Song wrote:
Millions rows in a single query? That sounds like a bad idea tome. Your "NoNodeAvailableException" could be caused bystop-the-world GC pauses, and the GC pauses are likely caused bythe query itself.
On 12/03/2021 13:39, Joe Obernberger wrote:
Thank you Paul and Erick.  The keyspace is defined like this:
CREATE KEYSPACE doc WITH replication = {'class':'SimpleStrategy', 'replication_factor': '3'} AND durable_writes= true;
Would that cause this?
The program that is having the problem selects data, calculatesstuff, and inserts. It works with smaller selects, but when thenumber of rows is in the millions, I start to get this error. Since it works with smaller sets, I don't believe it to be anetwork error. All the nodes are definitely up as otherprocesses are working OK, it's just this one program that fails.
The full stack trace:
Error: com.datastax.oss.driver.api.core.NoNodeAvailableException:No node was available to execute the querycom.datastax.oss.driver.api.core.NoNodeAvailableException: Nonode was available to execute the query atcom.datastax.oss.driver.api.core.NoNodeAvailableException.copy(NoNodeAvailableException.java:40) atcom.datastax.oss.driver.internal.core.util.concurrent.CompletableFutures.getUninterruptibly(CompletableFutures.java:149) atcom.datastax.oss.driver.internal.core.cql.CqlRequestSyncProcessor.process(CqlRequestSyncProcessor.java:53) atcom.datastax.oss.driver.internal.core.cql.CqlRequestSyncProcessor.process(CqlRequestSyncProcessor.java:30) atcom.datastax.oss.driver.internal.core.session.DefaultSession.execute(DefaultSession.java:230) atcom.datastax.oss.driver.api.core.cql.SyncCqlSession.execute(SyncCqlSession.java:54) atcom.abc.xxxx.fieldanalyzer.FTAProcess.udpateCassandraFTAMetrics(FTAProcess.java:275) atcom.abc.xxxx.fieldanalyzer.FTAProcess.storeResults(FTAProcess.java:216) atcom.abc.xxxx.fieldanalyzer.FTAProcess.startProcess(FTAProcess.java:199)
        at com.abc.xxxx.fieldanalyzer.Main.main(Main.java:20)

FTAProcess like 275 is:
ResultSet rs = session.execute(getFieldCounts.bind().setString(0,rb.getSource()).setString(1, rb.getFieldName()));
-Joe

On 3/12/2021 8:30 AM, Paul Chandler wrote:
Hi Joe
This could also be caused by the replication factor of thekeyspace, if you have NetworkTopologyStrategy and it doesn’tlist a replication factor for the datacenter datacenter1 thenyou will get this error message too.�
Paul
On 12 Mar 2021, at 13:07, Erick Ramirez<erick.rami...@datastax.com<mailto:erick.rami...@datastax.com>> wrote:
Does it get returned by the driver every single time? TheNoNodeAvailableExceptiongets thrown when (1) all nodes aredown, or (2) all the contact points are invalid from thedriver's perspective.
Is it possible there's no route/connectivity from your appserver(s) to the 172.16.x.xnetwork? If you post the full errormessage + full stacktrace, it might provide clues. Cheers!
<http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient>Virus-free. www.avg.com<http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient>
<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

Re: No node was available to execute query error

Reply via email to