> What are we doing wrong? Can it be that Cassandra is actually trying to read 
> all the CF data rather than just the keys! (actually, it doesn't need to go 
> to the users CF at all - all the data it needs is in the index CF)
>  
Data is not stored as a BTree, that's the RDBMS approach. We hit the in memory 
bloom filter, then perhaps the -index.db and finally the -data.db. While in 
this edge case it may be possible to serve your query just from the -index.db 
there is no optimisation in place for that. 

>  
> Select user_name from users where status = 2; 
>  
> Always crashes.
>  
What is the error ? 

> 2. understand if there is something in this use case which indicates that we 
> are not using Cassandra the way it is meant. 
Just like a RDBMS data base, this are fastest when you use the primary key, a 
bit slower when you use a non primary index, and slowest still when you do not 
use an index. 

Cheers

-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 25/04/2013, at 8:32 PM, moshe.kr...@barclays.com wrote:

> IMHO: user_name is not a column, it is the row key. Therefore, according 
> tohttp://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/ , the row does 
> not contain a relevant column index, which causes the iterator to read each 
> column (including value) of each row.
>  
> I believe that instead of referring to user_name as if it were a column, you 
> need to refer to it via the reserved word “KEY”, e.g.:
>  
> Select KEY from users where status = 2; 
>  
> Always glad to share a theory with a friend….
>  
>  
> From: Tamar Rosen [mailto:ta...@correlor.com] 
> Sent: Thursday, April 25, 2013 11:04 AM
> To: user@cassandra.apache.org
> Subject: Secondary Index on table with a lot of data crashes Cassandra
>  
> Hi,
>  
> We have a case of a reproducible crash, probably due to out of memory, but I 
> don't understand why. 
>  
> The installation is currently single node. 
>  
> We have a column family with approx 50000 rows. 
>  
> In cql, the CF definition is:
>  
>  
> CREATE TABLE users (
>   user_name text PRIMARY KEY,
>   big_json text,
>   status int
> );
>  
> Each big_json can have 500K or more of data.
>  
> There is also a secondary index on the status column. 
> Status can have various values, over 90% of all rows have status = 2. 
>  
>  
> Calling:
>  
> Select user_name from users limit 80000;
>  
> Is pretty fast
>  
>  
>  
> Calling:
>  
> Select user_name from users where status = 1; 
> is slower, even though much less data is returned.
>  
> Calling:
>  
> Select user_name from users where status = 2; 
>  
> Always crashes.
>  
>  
> What are we doing wrong? Can it be that Cassandra is actually trying to read 
> all the CF data rather than just the keys! (actually, it doesn't need to go 
> to the users CF at all - all the data it needs is in the index CF)
>  
>  
> Also, in the code I am doing the same using Astyanax index query with 
> pagination, and the behavior is the same. 
> 
> 
> Please help me:
>  
> 1. solve the immediate issue
>  
> 2. understand if there is something in this use case which indicates that we 
> are not using Cassandra the way it is meant. 
>  
> 
> 
> Thanks,
>  
> 
> 
> Tamar Rosen
>  
> Correlor.com
>  
> 
> 
>  
> _______________________________________________
> 
> This message may contain information that is confidential or privileged. If 
> you are not an intended recipient of this message, please delete it and any 
> attachments, and notify the sender that you have received it in error. Unless 
> specifically stated in the message or otherwise indicated, you may not 
> duplicate, redistribute or forward this message or any portion thereof, 
> including any attachments, by any means to any other person, including any 
> retail investor or customer. This message is not a recommendation, advice, 
> offer or solicitation, to buy/sell any product or service, and is not an 
> official confirmation of any transaction. Any opinions presented are solely 
> those of the author and do not necessarily represent those of Barclays. This 
> message is subject to terms available at: www.barclays.com/emaildisclaimer 
> and, if received from Barclays' Sales or Trading desk, the terms available 
> at: www.barclays.com/salesandtradingdisclaimer/. By messaging with Barclays 
> you consent to the foregoing. Barclays Bank PLC is a company registered in 
> England (number 1026167) with its registered office at 1 Churchill Place, 
> London, E14 5HP. This email may relate to or be sent from other members of 
> the Barclays group.
> 
> _______________________________________________
> 

Reply via email to