It does not matter that this table has one row or n rows. Before fetching data in the table foo, C* must determine:
1) how many primary keys of table "foo" match the condition foo_name='dave' --> read from the 2nd index "foo_name" where partition key = "dave" 2) how many primary keys of table "foo" match the condition int_val>0 --> read from the 2nd index "int_val" where partition key > 0, so basically it is a range scan Once it gets all the results from 2nd indices, C* can query the primary table to return data. I've read somewhere that when having multiple conditions in the WHERE clause, C* should use the most restrictive condition to optimize performance. In our example, equality condition on "foo_name" seems to be the most restrictive. My assumption is that C* does use statistics to determine the most restrictive condition and since here we have only 1 data, statictics are useless so it ends up doing a range scan on int_val .... It would be nice if someone can confirm/infirm the assumption. The last time I sneaked into the source code of 2nd index was more than 6 months ago so things may have changed since then On Wed, Aug 13, 2014 at 3:29 PM, Jack Krupansky <j...@basetechnology.com> wrote: > Agreed, but... in this case the table has ONE row, so what exactly > could be causing this timeout? I mean, it can’t be the row count, right? > > -- Jack Krupansky > > *From:* DuyHai Doan <doanduy...@gmail.com> > *Sent:* Wednesday, August 13, 2014 9:01 AM > *To:* user@cassandra.apache.org > *Subject:* Re: range query times out (on 1 node, just 1 row in table) > > Hello Ian > > Secondary index performs poorly with inequalities (<, ≤, >, ≥). Indeed > inequalities forces the server to scan all the cluster to find the > requested range, which is clearly not optimal. That's the reason why you > need to add "ALLOW FILTERING" for the query to be accepted. > > "ALLOW FILTERING" means "beware of what you're doing, we C* developers do > not give any guarantee about performance of such query". > > As Robert Coli used to say on this list, ALLOW FILTERING is synonym to > PROBABLY TIMEOUT :D > > > On Wed, Aug 13, 2014 at 2:56 PM, Ian Rose <ianr...@fullstory.com> wrote: > >> Confusingly, it appears to be the presence of an index on int_val that is >> causing this timeout. If I drop that index (leaving only the index on >> foo_name) the query works just fine. >> >> >> On Tue, Aug 12, 2014 at 10:25 PM, Ian Rose <ianr...@fullstory.com> wrote: >> >>> Hi - >>> >>> I am currently running a single Cassandra node on my local dev machine. >>> Here is my (test) schema (which is meaningless, I created it just to >>> demonstrate the issue I am running into): >>> >>> CREATE TABLE foo ( >>> foo_name ascii, >>> foo_shard bigint, >>> int_val bigint, >>> PRIMARY KEY ((foo_name, foo_shard)) >>> ) WITH read_repair_chance=0.1; >>> >>> CREATE INDEX ON foo (int_val); >>> CREATE INDEX ON foo (foo_name); >>> >>> I have inserted just a single row into this table: >>> insert into foo(foo_name, foo_shard, int_val) values('dave', 27, 100); >>> >>> This query works fine: >>> select * from foo where foo_name='dave'; >>> >>> But when I run this query, I get an RPC timeout: >>> select * from foo where foo_name='dave' and int_val > 0 allow filtering; >>> >>> With tracing enabled, here is the trace output: >>> http://pastebin.com/raw.php?i=6XMEVUcQ >>> >>> (In short, everything looks fine to my untrained eye until 10s elapsed, >>> at which time the following event is logged: "Timed out; received 0 of 1 >>> responses for range 257 of 257") >>> >>> Can anyone help interpret this error? >>> >>> Many thanks! >>> Ian >>> >>> >> >> > >