I forgot the random partitioner can be switched out.  We don't use ordered 
partitioner so I had forgotten about that one.  I guess it could only be a 
random partitioner type option :(.  I think 80% of projects use random 
partitioner though, right?

In fact, we use PlayOrm queries so the indice are ordered and we can still get 
stuff back in order even though we are on random partitioner in cassandra.

Later,
Dean

From: Andy Twigg <andy.tw...@gmail.com<mailto:andy.tw...@gmail.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Wednesday, May 29, 2013 10:51 AM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Re: random thoughts for MUCH faster key lookup in cassandra

How would you implement range queries?



On 29 May 2013 17:49, Hiller, Dean 
<dean.hil...@nrel.gov<mailto:dean.hil...@nrel.gov>> wrote:
We recently ran into too much data in one CF because LCS can't really run in 
parallel on one CF in a single tier which got me thinking, why doesn't the CF 
directoy have 100 or 1000 directories 0-999 and cassandra hash the key to which 
directory it would go in and then put it in one of the sstables in that 
directory.  This would lead to

 1.  Parallel compaction of LCS in a single CF !!!!  Yeah, faster compactions 
since there is less to sort in each directory(and it can be done in parallel 
too)
 2.  Help with fast key lookups as it hashes to one of the 1000 directories 
very quickly and then just needs to find the key in one of the sstables which 
are sorted (there would be 1000x less sstables in each directory than in one 
big CF)

Am I on crack here? Or does that seem like it would be a pretty good direction 
to go?

Maybe this is only because our system has 98% of it's data in one CF while 
other systems have 10% of their data in each CF though.  I still tend to think 
a lot of people will end up with 80% of their data in one CF and 20% in all the 
other CF's…isn't pareto's principal a natural tendency and if it is, maybe the 
above feature should be considered?

Later,
Dean



--
Dr Andy Twigg
Junior Research Fellow, St Johns College, Oxford
Room 351, Department of Computer Science
http://www.cs.ox.ac.uk/people/andy.twigg/
andy.tw...@cs.ox.ac.uk<mailto:andy.tw...@cs.ox.ac.uk> | +447799647538

Reply via email to