Re: range query times out (on 1 node, just 1 row in table)

DuyHai Doan Wed, 13 Aug 2014 07:28:33 -0700

It does not matter that this table has one row or n rows. Before fetching
data in the table foo, C* must determine:


1) how many primary keys of table "foo" match the condition foo_name='dave'
--> read from the 2nd index "foo_name" where partition key = "dave"
2) how many primary keys of table "foo" match the condition int_val>0 --> read
from the 2nd index "int_val" where partition key > 0, so basically it is a
range scan

Once it gets all the results from 2nd indices, C* can query the primary
table to return data.

 I've read somewhere that when having multiple conditions in the WHERE
clause, C* should use the most restrictive condition to optimize
performance. In our example, equality condition on "foo_name" seems to be
the most restrictive.

 My assumption is that C* does use statistics to determine the most
restrictive condition and since here we have only 1 data, statictics are
useless so it ends up doing a range scan on int_val ....

 It would be nice if someone can confirm/infirm the assumption. The last
time I sneaked into the source code of 2nd index was more than 6 months ago
so things may have changed since then




On Wed, Aug 13, 2014 at 3:29 PM, Jack Krupansky <j...@basetechnology.com>
wrote:

>   Agreed, but... in this case the table has ONE row, so what exactly
> could be causing this timeout? I mean, it can’t be the row count, right?
>
> -- Jack Krupansky
>
>  *From:* DuyHai Doan <doanduy...@gmail.com>
> *Sent:* Wednesday, August 13, 2014 9:01 AM
> *To:* user@cassandra.apache.org
> *Subject:* Re: range query times out (on 1 node, just 1 row in table)
>
>  Hello Ian
>
> Secondary index performs poorly with inequalities (<, ≤, >, ≥). Indeed
> inequalities forces the server to scan all the cluster to find the
> requested range, which is clearly not optimal. That's the reason why you
> need to add "ALLOW FILTERING" for the query to be accepted.
>
> "ALLOW FILTERING" means "beware of what you're doing, we C* developers do
> not give any guarantee about performance of such query".
>
> As Robert Coli used to say on this list, ALLOW FILTERING is synonym to
> PROBABLY TIMEOUT :D
>
>
> On Wed, Aug 13, 2014 at 2:56 PM, Ian Rose <ianr...@fullstory.com> wrote:
>
>> Confusingly, it appears to be the presence of an index on int_val that is
>> causing this timeout.  If I drop that index (leaving only the index on
>> foo_name) the query works just fine.
>>
>>
>> On Tue, Aug 12, 2014 at 10:25 PM, Ian Rose <ianr...@fullstory.com> wrote:
>>
>>> Hi -
>>>
>>> I am currently running a single Cassandra node on my local dev machine.
>>> Here is my (test) schema (which is meaningless, I created it just to
>>> demonstrate the issue I am running into):
>>>
>>>  CREATE TABLE foo (
>>>   foo_name ascii,
>>>   foo_shard bigint,
>>>   int_val bigint,
>>>   PRIMARY KEY ((foo_name, foo_shard))
>>> ) WITH read_repair_chance=0.1;
>>>
>>> CREATE INDEX ON foo (int_val);
>>> CREATE INDEX ON foo (foo_name);
>>>
>>> I have inserted just a single row into this table:
>>> insert into foo(foo_name, foo_shard, int_val) values('dave', 27, 100);
>>>
>>> This query works fine:
>>> select * from foo where foo_name='dave';
>>>
>>> But when I run this query, I get an RPC timeout:
>>> select * from foo where foo_name='dave' and int_val > 0 allow filtering;
>>>
>>> With tracing enabled, here is the trace output:
>>> http://pastebin.com/raw.php?i=6XMEVUcQ
>>>
>>> (In short, everything looks fine to my untrained eye until 10s elapsed,
>>> at which time the following event is logged: "Timed out; received 0 of 1
>>> responses for range 257 of 257")
>>>
>>> Can anyone help interpret this error?
>>>
>>> Many thanks!
>>> Ian
>>>
>>>
>>
>>
>
>

Re: range query times out (on 1 node, just 1 row in table)

Reply via email to