> What are we doing wrong? Can it be that Cassandra is actually trying to read > all the CF data rather than just the keys! (actually, it doesn't need to go > to the users CF at all - all the data it needs is in the index CF) > Data is not stored as a BTree, that's the RDBMS approach. We hit the in memory bloom filter, then perhaps the -index.db and finally the -data.db. While in this edge case it may be possible to serve your query just from the -index.db there is no optimisation in place for that.
> > Select user_name from users where status = 2; > > Always crashes. > What is the error ? > 2. understand if there is something in this use case which indicates that we > are not using Cassandra the way it is meant. Just like a RDBMS data base, this are fastest when you use the primary key, a bit slower when you use a non primary index, and slowest still when you do not use an index. Cheers ----------------- Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 25/04/2013, at 8:32 PM, moshe.kr...@barclays.com wrote: > IMHO: user_name is not a column, it is the row key. Therefore, according > tohttp://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/ , the row does > not contain a relevant column index, which causes the iterator to read each > column (including value) of each row. > > I believe that instead of referring to user_name as if it were a column, you > need to refer to it via the reserved word “KEY”, e.g.: > > Select KEY from users where status = 2; > > Always glad to share a theory with a friend…. > > > From: Tamar Rosen [mailto:ta...@correlor.com] > Sent: Thursday, April 25, 2013 11:04 AM > To: user@cassandra.apache.org > Subject: Secondary Index on table with a lot of data crashes Cassandra > > Hi, > > We have a case of a reproducible crash, probably due to out of memory, but I > don't understand why. > > The installation is currently single node. > > We have a column family with approx 50000 rows. > > In cql, the CF definition is: > > > CREATE TABLE users ( > user_name text PRIMARY KEY, > big_json text, > status int > ); > > Each big_json can have 500K or more of data. > > There is also a secondary index on the status column. > Status can have various values, over 90% of all rows have status = 2. > > > Calling: > > Select user_name from users limit 80000; > > Is pretty fast > > > > Calling: > > Select user_name from users where status = 1; > is slower, even though much less data is returned. > > Calling: > > Select user_name from users where status = 2; > > Always crashes. > > > What are we doing wrong? Can it be that Cassandra is actually trying to read > all the CF data rather than just the keys! (actually, it doesn't need to go > to the users CF at all - all the data it needs is in the index CF) > > > Also, in the code I am doing the same using Astyanax index query with > pagination, and the behavior is the same. > > > Please help me: > > 1. solve the immediate issue > > 2. understand if there is something in this use case which indicates that we > are not using Cassandra the way it is meant. > > > > Thanks, > > > > Tamar Rosen > > Correlor.com > > > > > _______________________________________________ > > This message may contain information that is confidential or privileged. If > you are not an intended recipient of this message, please delete it and any > attachments, and notify the sender that you have received it in error. Unless > specifically stated in the message or otherwise indicated, you may not > duplicate, redistribute or forward this message or any portion thereof, > including any attachments, by any means to any other person, including any > retail investor or customer. This message is not a recommendation, advice, > offer or solicitation, to buy/sell any product or service, and is not an > official confirmation of any transaction. Any opinions presented are solely > those of the author and do not necessarily represent those of Barclays. This > message is subject to terms available at: www.barclays.com/emaildisclaimer > and, if received from Barclays' Sales or Trading desk, the terms available > at: www.barclays.com/salesandtradingdisclaimer/. By messaging with Barclays > you consent to the foregoing. Barclays Bank PLC is a company registered in > England (number 1026167) with its registered office at 1 Churchill Place, > London, E14 5HP. This email may relate to or be sent from other members of > the Barclays group. > > _______________________________________________ >