Re: Inconsistent count(*) and distinct results from Cassandra

Rumph, Frens Jan Thu, 12 Mar 2015 00:32:18 -0700

Hi Jens, Mikhail, Daemeon,

Thanks for your replies. Sorry for my reply being late ... mails from the
user-list were moved to the wrong inbox on my side.


I'm in a development environment and thus using replication factor = 1 and
consistency = ONE with three nodes. So the 'results from different nodes
between queries' hypothesis seems unlikely to me. I would expect a timeout
if some node wouldn't be able to answer.

I tried tracing, but I couldn't really make any of it.

For example I performed two select distinct ... from ... queries: Traces
for both of them contained more than one line like 'Submitting range
requests on ... ranges ...' and 'Submitted ... concurrent range requests
covering ... ranges'. These lines occur with varying numbers, e.g. :

Submitting range requests on 593 ranges with a concurrency of 75 (1.35 rows
per range expected)
Submitting range requests on 769 ranges with a concurrency of 75 (1.35 rows
per range expected)


Also when looking at the lines like 'Executing seq scan across ... sstables
for ...' I saw that in one case which yielded way less partition keys that
only the tokens from -9223372036854770000  to -594461978511041000 were
included. In a case which yielded much more partition keys, the entire
token range did seem to be queried.

To reiterate my initial questions: is this behavior to be expected? Am I
doing something wrong? Is there a workaround?

Best regards,
Frens Jan

On 4 March 2015 at 22:59, daemeon reiydelle <daeme...@gmail.com> wrote:

> What is the replication? Could you be serving stale data from a node that
> was not properly replicated (hints timeout exceeded by a node being down?)
>
>
>
> On Wed, Mar 4, 2015 at 11:03 AM, Jens Rantil <jens.ran...@tink.se> wrote:
>
>> Frens,
>>
>> What consistency are you querying with? Could be you are simply receiving
>> result from different nodes each time.
>>
>> Jens
>>
>> –
>> Skickat från Mailbox <https://www.dropbox.com/mailbox>
>>
>>
>> On Wed, Mar 4, 2015 at 7:08 PM, Mikhail Strebkov <streb...@gmail.com>
>> wrote:
>>
>>> We have observed the same issue in our production Cassandra cluster (5
>>> nodes in one DC). We use Cassandra 2.1.3 (I joined the list too late to
>>> realize we shouldn’t user 2.1.x yet) on Amazon machines (created from
>>> community AMI).
>>>
>>> In addition to count variations with 5 to 10% we observe variations for
>>> the query “select * from table1 where time > '$fromDate' and time <
>>> '$toDate' allow filtering” results. We iterated through the results
>>> multiple times using official Java driver. We used that query for a huge
>>> data migration and were unpleasantly surprised that it is unreliable. In
>>> our case “nodetool repair” didn’t fix the issue.
>>>
>>> So I echo Frens questions.
>>>
>>> Thanks,
>>> Mikhail
>>>
>>>
>>>
>>>
>>> On Wed, Mar 4, 2015 at 3:55 AM, Rumph, Frens Jan <m...@frensjan.nl>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> Is it to be expected that select count(*) from ... and select distinct
>>>> partition-key-columns from ... to yield inconsistent results between
>>>> executions even though the table at hand isn't written to?
>>>>
>>>> I have a table in a keyspace with replication_factor = 1 which is
>>>> something like:
>>>>
>>>>  CREATE TABLE tbl (
>>>>     id frozen<id_type>,
>>>>     bucket bigint,
>>>>     offset int,
>>>>     value double,
>>>>     PRIMARY KEY ((id, bucket), offset)
>>>> )
>>>>
>>>> The frozen udt is:
>>>>
>>>>  CREATE TYPE id_type (
>>>>     tags map<text, text>
>>>> );
>>>>
>>>> When I do select count(*) from tbl several times the actual count
>>>> varies with 5 to 10%. Also when performing select distinct id, bucket from
>>>> tbl the results aren't consistent over several query executions. The table
>>>> is not being written to at the time I performed the queries.
>>>>
>>>> Is this to be expected? Or is this a bug? Is there a alternative method
>>>> / workaround?
>>>>
>>>> I'm using cqlsh 5.0.1 with Cassandra 2.1.2 on 64bit fedora 21 with
>>>> Oracle Java 1.8.0_31.
>>>>
>>>> Thanks in advance,
>>>> Frens Jan
>>>>
>>>
>>>
>>
>

Re: Inconsistent count(*) and distinct results from Cassandra

Reply via email to