Why are you keeping all your indexes in the same row? We do a similar thing 
(maintain several indexes over the same data) and we just have an index column 
family with keys like "dest192.168.0.1" which means destination index of 
192.168.0.1. You can do rows like User_Keys_By_Last_Name_adams and 
User_Keys_By_Last_Name_alden. You can keep the matching main column family key 
as the column name. This will ensure that your index is evenly distributed 
throughout your cluster.

----- Original Message -----
From: "Ed Anuff" <e...@anuff.com>
To: user@cassandra.apache.org
Sent: Thursday, August 25, 2011 12:48:49 PM
Subject: Re: Customized Secondary Index Schema

How many unique last names do you anticipate having? How many characters in the 
last name do you anticipate keeping in your index? You can easily do the math 
to figure out how many you could fit on a node. I think you'll find that the 
ceiling might be quite a bit higher than you think. If you have over a couple 
of hundred million users it might not be the best approach. There are a lot of 
very simple ways to split it up over multiple rows. As is the case with most 
things regarding Cassandra, the off-the-cuff assumptions only get you so far 
before you have to do some math and do some tests.

As I mentioned in my talk, for simple uses cases like this, you probably should 
just start with the built in secondary indexes, but I assume you already have 
explored those.

Ed


On Thu, Aug 25, 2011 at 9:27 AM, Alvin UW < alvi...@gmail.com > wrote:


Yes, this is what I am worrying about.


2011/8/24 Ryan King < r...@twitter.com >





On Tue, Aug 23, 2011 at 10:03 AM, Alvin UW < alvi...@gmail.com > wrote:
> Hello,
>
> As mentioned by Ed Anuff in his blog and slides, one way to build customized
> secondary index is:
> We use one CF, each row to represent a secondary index, with the secondary
> index name as row key.
> For example,
>
> Indexes = {
> "User_Keys_By_Last_Name" : {
> "adams" : "e5d61f2b-…",
> "alden" : "e80a17ba-…",
> "anderson" : "e5d61f2b-…",
> "davis" : "e719962b-…",
> "doe" : "e78ece0f-…",
> "franks" : "e66afd40-…",
> … : …,
> }
> }
>
> But the whole secondary index is partitioned into a single node, because of
> the row key.
> All the queries against this secondary index will go to this node. Of
> course, there are some replica nodes.
>
> Do you think this is a scalability problem, or any better solution to solve
> it?

Its certainly a scalability problem in that this solution has a hard
ceiling (this index can't get larger than the capacity of any single
node). It will probably work on small datasets, but if your dataset is
small then why are you using cassandra?

-ryan


Reply via email to