Hey Deepak,

"Are you suggesting to reduce the fetchSize (right now fetchSize is 5000) for this query?"

Definitely yes! If you would go with 1000 only that would give 5x more chance to the concrete Cassandra node/nodes which is/are executing your query to finish in time pulling together the records (page) - thus helps you to avoid the timeout issue. Based on our measurements smaller page sizes does not add too much to the overall query time at all - but helps Cassandra a lot to eventually fulfill the full request as she can do much better load balancing too as you are iterating over your result set.
I would give it a try - same tactics helped a lot on our side

I also recommend to try to optimize your data in parallel with the above - if possible and there is space for improvement. All I wrote earlier counts a lot. You need to also take care of data cleanup strategies in your tables to keep the amount of data managed somehow. TTL based approach e.g. is the best if you ask me especially if you have huge data set.

cheers

Attila Wind

http://www.linkedin.com/in/attilaw
Mobile: +49 176 43556932


27.10.2020 20:07 keltezéssel, Deepak Sharma írta:
Hi Attlila,

We did have larger partitions which are now below 100MB threshold after we ran nodetool repair. And now we do see most of the time, query runs are running successfully but there is a small percentage of query runs which are still failing.

Regarding your comment ```considered with your fetchSize together (driver setting on the query level)```, can you elaborate more on it? Are you suggesting to reduce the fetchSize (right now fetchSize is 5000) for this query?

Also, we are trying to use prefetch feature as well but it is also not helping. Following is the code:

Iterator<Row> iter = resultSet.iterator();
while (iter.hasNext()) {
  if (resultSet.getAvailableWithoutFetching() <= fetchSize && !resultSet.isFullyFetched()) {
    resultSet.fetchMoreResults();
  }
  Row row = iter.next();
  .....
}

Thanks,
Deepak

On Sat, Sep 19, 2020 at 6:56 PM Deepak Sharma <sharma.dee...@salesforce.com <mailto:sharma.dee...@salesforce.com>> wrote:

    Thanks Attila and Aaron for the response. These are great
    insights. I will check and get back to you in case I have any
    questions.

    Best,
    Deepak

    On Tue, Sep 15, 2020 at 4:33 AM Attila Wind
    <attilaw@swf.technology> wrote:

        Hi Deepak,

        Aaron has right - in order being able to help (better) you
        need to share those details

        That 5 secs timeout comes from the coordinator node I think -
        see cassandra.yaml "read_request_timeout_in_ms" setting - that
        is influencing this

        But it does not matter too much... The point is that none of
        the replicas could completed your query within that 5 secs.
        And this is a clean indication of something is slow with your
        query.
        Maybe 4) is a bit less important here, or I would a bit make
        it more precise: considered with your fetchSize together
        (driver setting on the query level)

        By experience one reason could be if the query which used to
        works starts not to work any longer is growing number of data.
        And a possible "wide cluster" problem.
        Do you have monitoring on the Cassandra machines? What does
        iowait show? (for us when things like this will start
        happening is a clean indication)

        cheers

        Attila Wind

        http://www.linkedin.com/in/attilaw
        Mobile: +49 176 43556932


        14.09.2020 18:36 keltezéssel, Aaron Ploetz írta:
        Deepak,

        Can you reply with:

        1) The query you are trying to run.
        2) The table definition (PRIMARY KEY, specifically).
        3) Maybe a little description of what the table is designed
        to do.
        4) How much data you're expecting returned (both # of rows
        and data size).

        Thanks,

        Aaron


        On Mon, Sep 14, 2020 at 10:58 AM Deepak Sharma
        <sharma.dee...@salesforce.com.invalid>
        <mailto:sharma.dee...@salesforce.com.invalid> wrote:

            Hi There,

            We are running into a strange issue in our Cassandra
            Cluster where one specific query is failing with
            following error:

            Cassandra timeout during read query at consistency QUORUM
            (3 responses were required but only 0 replica responded)

            This is not a typical query read timeout that we know for
            sure. This error is getting spit out within 5 seconds and
            the query timeout we have set is around 30 seconds

            Can we know what is happening here and how can we
            reproduce this in our local environment?

            Thanks,
            Deepak

Reply via email to