First idea to eliminate any issue with regards to staled data: issue the same count query with RF=QUORUM and check whether there are still inconsistencies
On Tue, Mar 10, 2015 at 9:13 AM, Rumph, Frens Jan <m...@frensjan.nl> wrote: > Hi Jens, Mikhail, Daemeon, > > Thanks for your replies. Sorry for my reply being late ... mails from the > user-list were moved to the wrong inbox on my side. > > I'm in a development environment and thus using replication factor = 1 and > consistency = ONE with three nodes. So the 'results from different nodes > between queries' hypothesis seems unlikely to me. I would expect a timeout > if some node wouldn't be able to answer. > > I tried tracing, but I couldn't really make any of it. > > For example I performed two select distinct ... from ... queries: Traces > for both of them contained more than one line like 'Submitting range > requests on ... ranges ...' and 'Submitted ... concurrent range requests > covering ... ranges'. These lines occur with varying numbers, e.g. : > > Submitting range requests on 593 ranges with a concurrency of 75 (1.35 > rows per range expected) > Submitting range requests on 769 ranges with a concurrency of 75 (1.35 > rows per range expected) > > > Also when looking at the lines like 'Executing seq scan across ... > sstables for ...' I saw that in one case which yielded way less partition > keys that only the tokens from -9223372036854770000 to -594461978511041000 > were included. In a case which yielded much more partition keys, the entire > token range did seem to be queried. > > To reiterate my initial questions: is this behavior to be expected? Am I > doing something wrong? Is there a workaround? > > Best regards, > Frens Jan > > On 4 March 2015 at 22:59, daemeon reiydelle <daeme...@gmail.com> wrote: > >> What is the replication? Could you be serving stale data from a node that >> was not properly replicated (hints timeout exceeded by a node being down?) >> >> >> >> On Wed, Mar 4, 2015 at 11:03 AM, Jens Rantil <jens.ran...@tink.se> wrote: >> >>> Frens, >>> >>> What consistency are you querying with? Could be you are simply >>> receiving result from different nodes each time. >>> >>> Jens >>> >>> – >>> Skickat från Mailbox <https://www.dropbox.com/mailbox> >>> >>> >>> On Wed, Mar 4, 2015 at 7:08 PM, Mikhail Strebkov <streb...@gmail.com> >>> wrote: >>> >>>> We have observed the same issue in our production Cassandra cluster (5 >>>> nodes in one DC). We use Cassandra 2.1.3 (I joined the list too late to >>>> realize we shouldn’t user 2.1.x yet) on Amazon machines (created from >>>> community AMI). >>>> >>>> In addition to count variations with 5 to 10% we observe variations for >>>> the query “select * from table1 where time > '$fromDate' and time < >>>> '$toDate' allow filtering” results. We iterated through the results >>>> multiple times using official Java driver. We used that query for a huge >>>> data migration and were unpleasantly surprised that it is unreliable. In >>>> our case “nodetool repair” didn’t fix the issue. >>>> >>>> So I echo Frens questions. >>>> >>>> Thanks, >>>> Mikhail >>>> >>>> >>>> >>>> >>>> On Wed, Mar 4, 2015 at 3:55 AM, Rumph, Frens Jan <m...@frensjan.nl> >>>> wrote: >>>> >>>>> Hi, >>>>> >>>>> Is it to be expected that select count(*) from ... and select distinct >>>>> partition-key-columns from ... to yield inconsistent results between >>>>> executions even though the table at hand isn't written to? >>>>> >>>>> I have a table in a keyspace with replication_factor = 1 which is >>>>> something like: >>>>> >>>>> CREATE TABLE tbl ( >>>>> id frozen<id_type>, >>>>> bucket bigint, >>>>> offset int, >>>>> value double, >>>>> PRIMARY KEY ((id, bucket), offset) >>>>> ) >>>>> >>>>> The frozen udt is: >>>>> >>>>> CREATE TYPE id_type ( >>>>> tags map<text, text> >>>>> ); >>>>> >>>>> When I do select count(*) from tbl several times the actual count >>>>> varies with 5 to 10%. Also when performing select distinct id, bucket from >>>>> tbl the results aren't consistent over several query executions. The table >>>>> is not being written to at the time I performed the queries. >>>>> >>>>> Is this to be expected? Or is this a bug? Is there a alternative >>>>> method / workaround? >>>>> >>>>> I'm using cqlsh 5.0.1 with Cassandra 2.1.2 on 64bit fedora 21 with >>>>> Oracle Java 1.8.0_31. >>>>> >>>>> Thanks in advance, >>>>> Frens Jan >>>>> >>>> >>>> >>> >> >