The first node can answer the question as long as you've requested less rows than the first node has on it. Hence the "low cardinality" point in what you quoted.
On Sat, Sep 3, 2011 at 5:00 AM, Kaj Magnus Lindberg <kajmagnu...@gmail.com> wrote: > Hello Anyone > > I have a follow up question on a question from February 2011. In > short, I wonder why one won't have to query all Cassandra nodes when > doing a secondary index lookup -- although each node only indexes data > that it holds locally. > > The question and answer was: > ( http://www.mail-archive.com/user@cassandra.apache.org/msg10506.html ) > === Question === > As far as I understand automatic secondary indexes are generated for > node local data. > In this case query by secondary index involve all nodes storing part of > column family to get results (?) so (if i am right) if data is spread across > 50 nodes then 50 nodes are involved in single query? > [...] > === Answer === > In practice, local secondary indexes scale to {RF * the limit of a single > machine} for -low cardinality- values (ex: users living in a certain state) > since the first node is likely to be able to answer your question. This also > means they are good for performing filtering for analytics. > [...] > > === Now I wonder === > Why would the first node be likely to be able to answer the question? > It stores only index entries for users on that particular machine, > (says http://wiki.apache.org/cassandra/SecondaryIndexes: > "Each node only indexes data that it holds locally" ) > but users might be stored by user name? And would thus be stored on > many different machines? Even if they happen to live in the same > state? > > Why won't the client need to query the indexes of [all servers that > store info on users] to find all relevant users, when doing a user > property lookup? > > > Best regards, KajMagnus > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com