Re: Why no need to query all nodes on secondary index lookup?

Jonathan Ellis Mon, 05 Sep 2011 14:06:13 -0700

The first node can answer the question as long as you've requested
less rows than the first node has on it.  Hence the "low cardinality"
point in what you quoted.


On Sat, Sep 3, 2011 at 5:00 AM, Kaj Magnus Lindberg
<kajmagnu...@gmail.com> wrote:
> Hello Anyone
>
> I have a follow up question on a question from February 2011. In
> short, I wonder why one won't have to query all Cassandra nodes when
> doing a secondary index lookup -- although each node only indexes data
> that it holds locally.
>
> The question and answer was:
>  ( http://www.mail-archive.com/user@cassandra.apache.org/msg10506.html  )
> === Question ===
> As far as I understand automatic secondary indexes are generated for
> node local data.
>   In this case query by secondary index involve all nodes storing part of
> column family to get results (?) so (if i am right) if data is spread across
> 50 nodes then 50 nodes are involved in single query?
> [...]
> === Answer ===
> In practice, local secondary indexes scale to {RF * the limit of a single
> machine} for -low cardinality- values (ex: users living in a certain state)
> since the first node is likely to be able to answer your question. This also
> means they are good for performing filtering for analytics.
> [...]
>
> === Now I wonder ===
> Why would the first node be likely to be able to answer the question?
> It stores only index entries for users on that particular machine,
>     (says http://wiki.apache.org/cassandra/SecondaryIndexes:
>     "Each node only indexes data that it holds locally" )
> but users might be stored by user name? And would thus be stored on
> many different machines? Even if they happen to live in the same
> state?
>
> Why won't the client need to query the indexes of [all servers that
> store info on users] to find all relevant users, when doing a user
> property lookup?
>
>
> Best regards, KajMagnus
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: Why no need to query all nodes on secondary index lookup?

Reply via email to