Re: Cassandra search performance

Jason Tang Sat, 12 May 2012 03:18:46 -0700

I try to search one column, this column store the time as the type Long,
1,000,000 data equally distributed in 24 hours, I only want to search
certain time rang, eg from 01:30 to 01:50 or 08:00 to 12:00, but something
stranger happened.


Search 00:00 to 23:59 limit 100
It took less then 1 second scan 100 record

Search 00:00 to 00:20 limit 100
It took more then one minute scan around 2,400 recods

So the result shows it seems cassandra scan one by one to match the
condition, and the data is not ordered in sequence.

One more thing, to have equal condition, I make a redundant column to have
equal condition, the value is same for all records.
The search condition like get record where equal='equal' and time > 00:00
and time < 00:20

Is it the expected behavior of secondary index or I didn't use it correct.

Because I used to have another test, I have one string column most of it is
string 'true' and I add 100 'false' among 1,000,000 'true' , it shows it
only scan 100 records.

So how can I exam what happened inside cassadra, and where I can find out
the detail of how secondary works?

在 2012年5月8日星期二，Maxim Potekhin 写道：

>  Thanks for the comments, much appreciated.
>
> Maxim
>
>
> On 5/7/2012 3:22 AM, David Jeske wrote:
>
> On Sun, Apr 29, 2012 at 4:32 PM, Maxim Potekhin 
> <potek...@bnl.gov<javascript:_e({}, 'cvml', 'potek...@bnl.gov');>
> > wrote:
>
>> Looking at your example,as I think you understand, you forgo indexes by
>> combining two conditions in one query, thinking along the lines of what is
>> often done in RDBMS. A scan is expected in this case, and there is no
>> magic to avoid it.
>>
>
>  This sounds like a mis-understanding of how RDBMSs work. If you combine
> two conditions in a single SQL query, the SQL execution optimizer looks at
> the cardinality of any indicies. If it can successfully predict that one of
> the conditions significantly reduces the set of rows that would be
> considered (such as a status match having 200 hits vs 1M rows in the
> table), then it selects this index for the first-iteration, and each index
> hit causes a record lookup which is then tested for the other conditions.
>  (This is one of several query-execution types RDBMS systems use)
>
>  I'm no Cassandra expert, so I don't know what it does WRT
> index-selection, but from the page written on secondary indicies, it seems
> like if you just query on status, and do the other filtering yourself it'll
> probably do what you want...
>
>  http://www.datastax.com/dev/blog/whats-new-cassandra-07-secondary-indexes
>
>
>>  However, if this query is important, you can easily index on two
>> conditions,
>> using a composite type (look it up), or string concatenation for quick and
>> easy solution.
>>
>
>  This is not necessarily a good idea. Creating a composite index explodes
> the index size unnecessarily. If a condition can reduce a query to 200
> records, there is no need to have a composite index including another
> condition.
>
>
>

Re: Cassandra search performance

Reply via email to