Cluster Cloning

2012-02-01 Thread Hefeng Yuan
Hi, We need clone the data between 2 clusters. These 2 clusters have different number of nodes. source: 6 nodes, RF5 destination: 3 nodes, RF3 Cassandra version is 0.8.1. Is there any suggestion on how to do this? -- Thanks, Hefeng

Re: Calculate number of nodes required based on data

2011-09-07 Thread Hefeng Yuan
suggestion on how to calculate how many more nodes to add? Or, generally how to plan for number of nodes required, from a performance perspective? Thanks, Hefeng On Sep 7, 2011, at 9:56 AM, Adi wrote: On Tue, Sep 6, 2011 at 3:53 PM, Hefeng Yuan hfy...@rhapsody.com wrote: Hi, Is there any

Re: Calculate number of nodes required based on data

2011-09-07 Thread Hefeng Yuan
: On Wed, Sep 7, 2011 at 1:09 PM, Hefeng Yuan hfy...@rhapsody.com wrote: Adi, The reason we're attempting to add more nodes is trying to solve the long/simultaneous compactions, i.e. the performance issue, not the storage issue yet. We have RF 5 and CL QUORUM for read and write, we have

Re: Calculate number of nodes required based on data

2011-09-07 Thread Hefeng Yuan
, at 11:31 AM, Adi wrote: On Wed, Sep 7, 2011 at 2:09 PM, Hefeng Yuan hfy...@rhapsody.com wrote: We didn't change MemtableThroughputInMB/min/maxCompactionThreshold, they're 499/4/32. As for why we're flushing at ~9m, I guess it has to do with this: http://thelastpickle.com/2011/05/04/How

Calculate number of nodes required based on data

2011-09-06 Thread Hefeng Yuan
Hi, Is there any suggested way of calculating number of nodes needed based on data? We currently have 6 nodes (each has 8G memory) with RF5 (because we want to be able to survive loss of 2 nodes). The flush of memtable happens around every 30 min (while not doing compaction), with ~9m

Re: Avoid Simultaneous Minor Compactions?

2011-08-22 Thread Hefeng Yuan
at compaction_throughput_mb_per_sec in cassandra.yaml On Mon, Aug 22, 2011 at 12:39 AM, Ryan King r...@twitter.com wrote: You should throttle your compactions to a sustainable level. -ryan On Sun, Aug 21, 2011 at 10:22 PM, Hefeng Yuan hfy...@rhapsody.com wrote: We just noticed that at one time, 4 nodes were

CQL count(*) counts deleted records?

2011-08-22 Thread Hefeng Yuan
We're using 0.8.1, and found the CQL count(*) is counting deleted columns, CLI doesn't have this issue. Is it a bug? cqlsh select * from CF where key = 'foo'; u'foo' cqlsh select count(*) from CF where key = 'foo'; (3,)

Avoid Simultaneous Minor Compactions?

2011-08-21 Thread Hefeng Yuan
We just noticed that at one time, 4 nodes were doing minor compaction together, each of them took 20~60 minutes. We're on 0.8.1, 6 nodes, RF5. This simultaneous compactions slowed down the whole cluster, we have local_quorum consistency level, therefore, dynamic_snitch is not helping us. Aside

Different cluster gossiping to each other

2011-08-19 Thread Hefeng Yuan
Symptom is that when we populate data into the non-prod cluster, after a while, we start seeing this warning message from the prod cluster: WARN [GossipStage:1] 2011-08-19 19:47:35,730 GossipDigestSynVerbHandler.java (line 63) ClusterName mismatch from non-prod-node-ip

Nodetool repair takes 4+ hours for about 10G data

2011-08-18 Thread Hefeng Yuan
Hi, Is it normal that the repair takes 4+ hours for every node, with only about 10G data? If this is not expected, do we have any hint what could be causing this? The ring looks like below, we're using 0.8.1. Our repair is scheduled to run once per week for all nodes. Compaction related

One hot node slows down whole cluster

2011-08-17 Thread Hefeng Yuan
Hi, We're noticing that when one node gets hot (very high cpu usage) because of 'nodetool repair', the whole cluster's performance becomes really bad. We're using 0.8.0 with random partition. We have 6 nodes with RF 5. Our repair is scheduled to run once a week, spread across whole cluster. I

Re: One hot node slows down whole cluster

2011-08-17 Thread Hefeng Yuan
Sorry, correction, we're using 0.8.1. On Aug 17, 2011, at 11:24 AM, Hefeng Yuan wrote: Hi, We're noticing that when one node gets hot (very high cpu usage) because of 'nodetool repair', the whole cluster's performance becomes really bad. We're using 0.8.1 with random partition. We have

Re: One hot node slows down whole cluster

2011-08-17 Thread Hefeng Yuan
Just wondering, would it help if we shorten the rpc_timeout_in_ms (currently using 30,000), so that when one node gets hot and responding slowly, others will just take it as down and move forward? On Aug 17, 2011, at 11:35 AM, Hefeng Yuan wrote: Sorry, correction, we're using 0.8.1. On Aug

Re: One hot node slows down whole cluster

2011-08-17 Thread Hefeng Yuan
and if the Dynamic Snitch is doing the right thing. I would look at that error first, seems odd. Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 18/08/2011, at 6:52 AM, Hefeng Yuan wrote: Just wondering, would

CQL COUNT Not Accurate?

2011-07-22 Thread Hefeng Yuan
Hi, I just noticed that the count(*) in CQL seems to be having wrong answer, when I have only one row, the count(*) returns two. Below are the commands I tried: cqlsh SELECT COUNT(*) FROM UserProfile USING CONSISTENCY QUORUM WHERE KEY IN ('00D760DB1730482D81BC6845F875A97D'); (2,) cqlsh select

Secondary Index doesn't work with LOCAL_QUORUM

2011-07-11 Thread Hefeng Yuan
Hi, We're using Cassandra with 2 DC - one OLTP Cassandra, 6 nodes, with RF3 - the other is a Brisk, 3 nodes, with RF1 We noticed that when I do a write-then-read operation on the Cassandra DC, it fails with the following information (from cqlsh): Unable to complete request: one or more nodes