Frankly, no matter how inefficient / expensive the query is, surely it should still work when there is only 1 row and 1 node (which is localhost)!
I'm starting to wonder if range queries on secondary indexes aren't supported at all (although if that is the case, I would certainly prefer an error rather than a timeout!). I've been scouring the web trying to find a definitive answer on this but all I have come up with is this (old, non-authoritative) blog post which states "Cassandra’s native index is like a hashed index, which means you can only do equality query and not range query." http://pkghosh.wordpress.com/2011/03/02/cassandra-secondary-index-patterns/ On Wed, Aug 13, 2014 at 10:27 AM, DuyHai Doan <doanduy...@gmail.com> wrote: > It does not matter that this table has one row or n rows. Before fetching > data in the table foo, C* must determine: > > 1) how many primary keys of table "foo" match the condition foo_name='dave' > --> read from the 2nd index "foo_name" where partition key = "dave" > 2) how many primary keys of table "foo" match the condition int_val>0 --> read > from the 2nd index "int_val" where partition key > 0, so basically it is a > range scan > > Once it gets all the results from 2nd indices, C* can query the primary > table to return data. > > I've read somewhere that when having multiple conditions in the WHERE > clause, C* should use the most restrictive condition to optimize > performance. In our example, equality condition on "foo_name" seems to be > the most restrictive. > > My assumption is that C* does use statistics to determine the most > restrictive condition and since here we have only 1 data, statictics are > useless so it ends up doing a range scan on int_val .... > > It would be nice if someone can confirm/infirm the assumption. The last > time I sneaked into the source code of 2nd index was more than 6 months ago > so things may have changed since then > > > > > On Wed, Aug 13, 2014 at 3:29 PM, Jack Krupansky <j...@basetechnology.com> > wrote: > >> Agreed, but... in this case the table has ONE row, so what exactly >> could be causing this timeout? I mean, it can’t be the row count, right? >> >> -- Jack Krupansky >> >> *From:* DuyHai Doan <doanduy...@gmail.com> >> *Sent:* Wednesday, August 13, 2014 9:01 AM >> *To:* user@cassandra.apache.org >> *Subject:* Re: range query times out (on 1 node, just 1 row in table) >> >> Hello Ian >> >> Secondary index performs poorly with inequalities (<, ≤, >, ≥). Indeed >> inequalities forces the server to scan all the cluster to find the >> requested range, which is clearly not optimal. That's the reason why you >> need to add "ALLOW FILTERING" for the query to be accepted. >> >> "ALLOW FILTERING" means "beware of what you're doing, we C* developers do >> not give any guarantee about performance of such query". >> >> As Robert Coli used to say on this list, ALLOW FILTERING is synonym to >> PROBABLY TIMEOUT :D >> >> >> On Wed, Aug 13, 2014 at 2:56 PM, Ian Rose <ianr...@fullstory.com> wrote: >> >>> Confusingly, it appears to be the presence of an index on int_val that >>> is causing this timeout. If I drop that index (leaving only the index on >>> foo_name) the query works just fine. >>> >>> >>> On Tue, Aug 12, 2014 at 10:25 PM, Ian Rose <ianr...@fullstory.com> >>> wrote: >>> >>>> Hi - >>>> >>>> I am currently running a single Cassandra node on my local dev >>>> machine. Here is my (test) schema (which is meaningless, I created it just >>>> to demonstrate the issue I am running into): >>>> >>>> CREATE TABLE foo ( >>>> foo_name ascii, >>>> foo_shard bigint, >>>> int_val bigint, >>>> PRIMARY KEY ((foo_name, foo_shard)) >>>> ) WITH read_repair_chance=0.1; >>>> >>>> CREATE INDEX ON foo (int_val); >>>> CREATE INDEX ON foo (foo_name); >>>> >>>> I have inserted just a single row into this table: >>>> insert into foo(foo_name, foo_shard, int_val) values('dave', 27, 100); >>>> >>>> This query works fine: >>>> select * from foo where foo_name='dave'; >>>> >>>> But when I run this query, I get an RPC timeout: >>>> select * from foo where foo_name='dave' and int_val > 0 allow filtering; >>>> >>>> With tracing enabled, here is the trace output: >>>> http://pastebin.com/raw.php?i=6XMEVUcQ >>>> >>>> (In short, everything looks fine to my untrained eye until 10s elapsed, >>>> at which time the following event is logged: "Timed out; received 0 of 1 >>>> responses for range 257 of 257") >>>> >>>> Can anyone help interpret this error? >>>> >>>> Many thanks! >>>> Ian >>>> >>>> >>> >>> >> >> > >