Proposal to lower the minimum limit for phi_convict_threshold

2012-01-21 Thread Maki Watanabe
Hello, The current trunk limit the value of phi_convict_threshold from 5 to 16 in DatabaseDescriptor.java. And phi value is calculated in FailureDetector.java as PHI_FACTOR x time_since_last_gossip / mean_heartbeat_interval And the PHI_FACTOR is a predefined value: PHI_FACTOR = 1 / Log(10) =~

Get all keys from the cluster

2012-01-21 Thread Marcel Steinbach
We're running a 8 node cluster with different CFs for different applications. One of the application uses 1.5TB out of 1.8TB in total, but only because we started out with a deletion mechanism and implemented one later on. So there is probably a high amount of old data in there, that we don't

Re: Get all keys from the cluster

2012-01-21 Thread Eric Czech
Is there any way that you could do that lookup in reverse where you pull the records from your SQL database, figure out which keys aren't necessary, and then delete any unnecessary keys that may or may not exist in cassandra? If that's not a possibility, then what about creating the same

Re: Proposal to lower the minimum limit for phi_convict_threshold

2012-01-21 Thread Radim Kolar
Anyway, I can't find any reason to limit minimum value of phi_convict_threshold to 5. maki In real world you often want to have 9 because cassandra is too much sensitive to overloaded LAN and nodes are flipping up/down often and creating chaos in cluster if you have larger number of nodes (let

Re: ideal cluster size

2012-01-21 Thread Eric Czech
I'd also add that one of the biggest complications to arise from having multiple clusters is that read biased client applications would need to be aware of all clusters and either aggregate result sets or involve logic to choose the right cluster based on a particular query. And from a more

Re: Get all keys from the cluster

2012-01-21 Thread Marcel Steinbach
Thanks for your suggestions, Eric! One of the application uses 1.5TB out of 1.8TB I'm sorry, maybe that statment was slightly ambiguous. I meant to say, that one application uses 1.5TB, while the others use 300GB, totalling in 1.8TB of data. Our total disk capacity, however, is at about 7 TB,

Re: Get all keys from the cluster

2012-01-21 Thread Eric Czech
Great! I'm glad at least one of those ideas was helpful for you. That's a road we've travelled before and as one last suggestion that might help, you could alter all client writers to cassandra beforehand so that they write to BOTH keyspaces BEFORE beginning the SQL based transfer. This might

Re: Cassandra to Oracle?

2012-01-21 Thread Eric Czech
Hi Brian, We're trying to do the exact same thing and I find myself asking very similar questions. Our solution though has been to find what kind of queries we need to satisfy on a preemptive basis and leverage cassandra's built-in indexing features to build those result sets beforehand. The

Re: Data Model Question

2012-01-21 Thread R. Verlangen
A couple of days ago I came across Countandra ( http://countandra.org/ ). It seems that it might be a solution for you. Gr. Robin 2012/1/20 Tamar Fraenkel ta...@tok-media.com ** Hi! I am a newbie to Cassandra and seeking some advice regarding the data model I should use to best address

Re: Unbalanced cluster with RandomPartitioner

2012-01-21 Thread Marcel Steinbach
I thought about our issue again and was thinking, maybe the describeOwnership should take into account, if a token is outside the partitioners maximum token range? To recap our problem: we had tokens, that were apart by 12.5% of the token range 2**127, however, we had an offset on each token,

Re: ideal cluster size

2012-01-21 Thread Thorsten von Eicken
Thanks for the responses! We'll definitely go for powerful servers to reduce the total count. Beyond a dozen servers there really doesn't seem to be much point in trying to increase count anymore for replication/redundancy. I'm assuming we will use level compaction, which means that we'll most

Re: ideal cluster size

2012-01-21 Thread Peter Schuller
Thanks for the responses! We'll definitely go for powerful servers to reduce the total count. Beyond a dozen servers there really doesn't seem to be much point in trying to increase count anymore for Just be aware that if big servers imply *lots* of data (especially in relation to memory

Re: Data Model Question

2012-01-21 Thread Jean-Nicolas Boulay Desjardins
But What about: Rainbird? On Sat, Jan 21, 2012 at 10:52 AM, R. Verlangen ro...@us2.nl wrote: A couple of days ago I came across Countandra ( http://countandra.org/ ). It seems that it might be a solution for you. Gr. Robin 2012/1/20 Tamar Fraenkel ta...@tok-media.com Hi! I am a

Re: Data Model Question

2012-01-21 Thread Milind Parikh
I used rainbird as inspiration for Countandra ( some of publicly available data structures from rainbird preso). That said, there are significant differences between the two architectures. Additiomally as Cassandra begins to provide triggets, some very interesting things will become possible in

Re: ideal cluster size

2012-01-21 Thread Thorsten von Eicken
Good point. One thing I'm wondering about cassandra is what happens when there is a massive failure. For example, if 1/3 of the nodes go down or become unreachable. This could happen in EC2 if an AZ has a failure, or in a datacenter if a whole rack or UPS goes dark. I'm not so concerned about the

Re: Data Model Question

2012-01-21 Thread Jean-Nicolas Boulay Desjardins
Milind Parikh, Rainbird is back by Twitter... My worry is that you might not be around in the future... Also, do you have evidence that your system is better? Because Rainbird is used by Twitter. On Sat, Jan 21, 2012 at 6:55 PM, Milind Parikh milindpar...@gmail.com wrote: I used rainbird as

Re: Data Model Question

2012-01-21 Thread Tamar Fraenkel
Hi It may be my lack of knowledge but both has to do with counting, which is not what I need. What is wrong with the two models I suggested? Tamar Sent from my iPod On Jan 22, 2012, at 2:49 AM, Jean-Nicolas Boulay Desjardins jnbdzjn...@gmail.com wrote: Milind Parikh, Rainbird is back by

Re: Data Model Question

2012-01-21 Thread Edward Capriolo
On Sat, Jan 21, 2012 at 7:49 PM, Jean-Nicolas Boulay Desjardins jnbdzjn...@gmail.com wrote: Milind Parikh, Rainbird is back by Twitter... My worry is that you might not be around in the future... Also, do you have evidence that your system is better? Because Rainbird is used by Twitter. On