Hey Deepak,
"Are you suggesting to reduce the fetchSize (right now fetchSize is
5000) for this query?"
Definitely yes! If you would go with 1000 only that would give 5x more
chance to the concrete Cassandra node/nodes which is/are executing your
query to finish in time pulling together the records (page) - thus helps
you to avoid the timeout issue.
Based on our measurements smaller page sizes does not add too much to
the overall query time at all - but helps Cassandra a lot to eventually
fulfill the full request as she can do much better load balancing too as
you are iterating over your result set.
I would give it a try - same tactics helped a lot on our side
I also recommend to try to optimize your data in parallel with the above
- if possible and there is space for improvement.
All I wrote earlier counts a lot. You need to also take care of data
cleanup strategies in your tables to keep the amount of data managed
somehow. TTL based approach e.g. is the best if you ask me especially if
you have huge data set.
cheers
Attila Wind
http://www.linkedin.com/in/attilaw
Mobile: +49 176 43556932
27.10.2020 20:07 keltezéssel, Deepak Sharma írta:
Hi Attlila,
We did have larger partitions which are now below 100MB threshold
after we ran nodetool repair. And now we do see most of the time,
query runs are running successfully but there is a small percentage of
query runs which are still failing.
Regarding your comment ```considered with your fetchSize together
(driver setting on the query level)```, can you elaborate more on it?
Are you suggesting to reduce the fetchSize (right now fetchSize is
5000) for this query?
Also, we are trying to use prefetch feature as well but it is also not
helping. Following is the code:
Iterator<Row> iter = resultSet.iterator();
while (iter.hasNext()) {
if (resultSet.getAvailableWithoutFetching() <= fetchSize &&
!resultSet.isFullyFetched()) {
resultSet.fetchMoreResults();
}
Row row = iter.next();
.....
}
Thanks,
Deepak
On Sat, Sep 19, 2020 at 6:56 PM Deepak Sharma
<sharma.dee...@salesforce.com <mailto:sharma.dee...@salesforce.com>>
wrote:
Thanks Attila and Aaron for the response. These are great
insights. I will check and get back to you in case I have any
questions.
Best,
Deepak
On Tue, Sep 15, 2020 at 4:33 AM Attila Wind
<attilaw@swf.technology> wrote:
Hi Deepak,
Aaron has right - in order being able to help (better) you
need to share those details
That 5 secs timeout comes from the coordinator node I think -
see cassandra.yaml "read_request_timeout_in_ms" setting - that
is influencing this
But it does not matter too much... The point is that none of
the replicas could completed your query within that 5 secs.
And this is a clean indication of something is slow with your
query.
Maybe 4) is a bit less important here, or I would a bit make
it more precise: considered with your fetchSize together
(driver setting on the query level)
By experience one reason could be if the query which used to
works starts not to work any longer is growing number of data.
And a possible "wide cluster" problem.
Do you have monitoring on the Cassandra machines? What does
iowait show? (for us when things like this will start
happening is a clean indication)
cheers
Attila Wind
http://www.linkedin.com/in/attilaw
Mobile: +49 176 43556932
14.09.2020 18:36 keltezéssel, Aaron Ploetz írta:
Deepak,
Can you reply with:
1) The query you are trying to run.
2) The table definition (PRIMARY KEY, specifically).
3) Maybe a little description of what the table is designed
to do.
4) How much data you're expecting returned (both # of rows
and data size).
Thanks,
Aaron
On Mon, Sep 14, 2020 at 10:58 AM Deepak Sharma
<sharma.dee...@salesforce.com.invalid>
<mailto:sharma.dee...@salesforce.com.invalid> wrote:
Hi There,
We are running into a strange issue in our Cassandra
Cluster where one specific query is failing with
following error:
Cassandra timeout during read query at consistency QUORUM
(3 responses were required but only 0 replica responded)
This is not a typical query read timeout that we know for
sure. This error is getting spit out within 5 seconds and
the query timeout we have set is around 30 seconds
Can we know what is happening here and how can we
reproduce this in our local environment?
Thanks,
Deepak