Re: Why Cassandra secondary indexes are so slow on just 350k rows?

Hiller, Dean Thu, 30 Aug 2012 13:30:03 -0700

It seems to me you may want to revisit the design(but not 100% sure as I am not 
sure I understand the entire context) a bit as I could see having partitions 
and a few clients that poll in each partition so you can scale to infinity 
basically with no issues.  If you are doing all this polling from one machine, 
it just won't scale very well.


playOrm does this for you but the basic pattern you can do yourself without 
playOrm would be….

Row 1
Row 2
Row 3
Row 4

Index row for partition 1 - <val>.row1, <val>.row4
Index row for partition 2 - <val>.row2, <val>.row3
…

Now each server is responsible for polling / scanning it's partitions index 
rows above.  If you have 2 servers and 2 partitions, each one would column scan 
the above index rows and then lookup the actual rows.  If it is unbalanced like 
5 severs and 28 partitions, you can use hash code of partition of course and 
number of servers to figure out if server owns that partition are not for 
polling.

All of this is automatic in playOrm with S-JQL (Scalable-JQL – one minor change 
to SQL to make it scalable).

Later,
Dean



From: Edward Kibardin <infa...@gmail.com<mailto:infa...@gmail.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Thursday, August 30, 2012 2:14 PM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Re: Why Cassandra secondary indexes are so slow on just 350k rows?

t should not depend on number of rows in CF but from number of rows per one 
index value

Re: Why Cassandra secondary indexes are so slow on just 350k rows?

Reply via email to