The conversation around the partitioner sidetracks a bit from your original question.
You originally asked: >> Business case: Show me all events for a given customer in a given time frame In RDBMS it will be (Query1) where customer_id = '289' and event_time >= '2016-03-01 18:45:00+0000' and event_time <= '2016-03-12 19:05:00+0000' ;" But C* does not allow >= <= on PKY cols << Actually, C* does allow range queries on _some_ primary key columns, just not the partition key portion of the primary key columns. The primary key that you are looking for is probably: ((customer_id),event_id). Structuring the key like this uses customer_id as the partition key where you can use an equality clause as show above (customer_id='289') followed by range clauses on the evernt_id, which is now treated as a _clustering_column_. Clustering columns are a concept that you probably want to look in to further to wrap your head around this kind of query pattern. On Thu, Mar 10, 2016 at 5:02 PM, Rakesh Kumar <dcrunch...@aim.com> wrote: > thanks. that explains it. > > > -----Original Message----- > From: Jack Krupansky <jack.krupan...@gmail.com> > To: user <user@cassandra.apache.org> > Sent: Thu, Mar 10, 2016 5:28 pm > Subject: Re: What is wrong in this token function > > From the doc: "When using the RandomPartitioner or Murmur3Partitioner, > Cassandra rows are ordered by the hash of their value and hence the order > of rows is not meaningful... The ByteOrdered partitioner arranges tokens > the same way as key values, but the RandomPartitioner and > Murmur3Partitioner distribute tokens in a completely unordered manner. The > token function makes it possible to page through these unordered > partitioner results." > > See: > https://docs.datastax.com/en/cql/3.1/cql/cql_using/paging_c.html (for 2.1) > https://docs.datastax.com/en/cql/3.3/cql/cql_using/usePaging.html (for > 2.2 and 3.x) > > > -- Jack Krupansky > > On Thu, Mar 10, 2016 at 5:14 PM, Rakesh Kumar <dcrunch...@aim.com> wrote: > >> I am using default Murmur3. So are you saying in case of Murmur3 the >> following two queries >> >> select count*) >> where customer_id = '289' >> and event_time >= '2016-03-01 18:45:00+0000' and event_time <= >> '2016-03-12 19:05:00+0000' ; >> and >> select count(*) >> where token(customer_id,event_time) >= token('289','2016-03-01 >> 18:45:00+0000') >> and token(customer_id,event_time) <= token('289','2016-03-12 >> 19:05:00+0000') ; >> >> are not same ? >> >> And yes I am aware of how to change the clustering_key to get the first >> query. This question is more of academic exercise for me. >> >> >> -----Original Message----- >> From: Jack Krupansky <jack.krupan...@gmail.com> >> To: user <user@cassandra.apache.org> >> Sent: Thu, Mar 10, 2016 4:55 pm >> Subject: Re: What is wrong in this token function >> >> What partitioner are you using? The default partitioner is not "ordered", >> so it will randomly order the hashes/tokens, so that tokens will not be >> ordered even if your PKs are ordered. You probably want to use customer as >> your partition key and event time as a clustering column - then you can use >> RDBMS-like WHERE conditions to select a slice of the partition. >> >> -- Jack Krupansky >> >> On Thu, Mar 10, 2016 at 4:45 PM, Rakesh Kumar <dcrunch...@aim.com> wrote: >> >>> >>> typo: the primary key was (customer_id + event_time ) >>> >>> >>> -----Original Message----- >>> From: Rakesh Kumar <dcrunch...@aim.com> >>> To: user <user@cassandra.apache.org> >>> Sent: Thu, Mar 10, 2016 4:44 pm >>> Subject: What is wrong in this token function >>> >>> C* 3.0.3 >>> >>> I have a table table1 which has the primary key on >>> ((customer_id,event_id)). >>> >>> I loaded 1.03 million rows from a csv file. >>> >>> Business case: Show me all events for a given customer in a given time >>> frame >>> >>> In RDBMS it will be >>> >>> (Query1) >>> where customer_id = '289' >>> and event_time >= '2016-03-01 18:45:00+0000' and event_time <= >>> '2016-03-12 19:05:00+0000' ; >>> >>> But C* does not allow >= <= on PKY cols. It suggested token function. >>> >>> So I did this: >>> >>> (Query2) >>> where token(customer_id,event_time) >= token('289','2016-03-01 >>> 18:45:00+0000') >>> and token(customer_id,event_time) <= token('289','2016-03-12 >>> 19:05:00+0000') ; >>> >>> I am seeing 75% more rows than what it should be. It should be 99K rows, >>> it shows 163K. >>> >>> I checked the output with the csv file itself. To double check I loaded >>> the csv in another table >>> with modified PKY so that the first query (Query1) can be executed. It >>> also showed 99K rows. >>> >>> Am I using token function incorrectly ? >>> >>> >>> >>> >> >