Re: nodetool repair caused high disk space usage

2011-08-21 Thread Philippe
Do you have an indication that at least the disk space is in fact consistent with the amount of data being streamed between the nodes? I think you had 90 - ~ 450 gig with RF=3, right? Still sounds like a lot assuming repairs are not running concurrently (and compactions are able to run after

Re: Cluster key distribution wrong after upgrading to 0.8.4

2011-08-21 Thread aaron morton
This looks like an artifact of the way ownership is calculated for the OOP. See https://github.com/apache/cassandra/blob/cassandra-0.8.4/src/java/org/apache/cassandra/dht/OrderPreservingPartitioner.java#L177 it was changed in this ticket https://issues.apache.org/jira/browse/CASSANDRA-2800 The

Re: Cassandra Memory Trend - increased memory usage when node idles.

2011-08-21 Thread aaron morton
Using memory allocated to the JVM is not really a problem unless it's OOM'ing. Or running into performance issues due to excessive GC. One scenario I could imagine is a timeout triggered on a dirty memtable, this resulted in a flush, the flush resulted in a minor compaction, the minor

Re: Different cluster gossiping to each other

2011-08-21 Thread aaron morton
Did you clear the LocationInfo from the non prod cluster ? When you gave prod seeds from non-prod, non-prod would have discovered all the nodes in prod. Unless you have cleared the location info they will still have that knowledge. Does nodetool ring in non-prod list any prod machines ? if

Re: Completely removing a node from the cluster

2011-08-21 Thread aaron morton
Unreachable nodes in either did not respond to the message or were known to be down and were not sent a message. The way the node lists are obtained for the ring command and describe cluster are the same. So it's a bit odd. Can you connect to JMX and have a look at the o.a.c.db.StorageService

how to know if nodetool cleanup is safe?

2011-08-21 Thread Yan Chunlu
since nodetool cleanup could remove hinted handoff, will it cause the data loss?

Re: 0.7.4: Replication assertion error after removetoken, removetoken force and a restart

2011-08-21 Thread aaron morton
There is some confusion in the ring about nodes leaving. Check nodetool ring from every node and see if they agree. Check the logs to see if there is any information about node is sending the wrong message. Without knowing much more you could try a rolling restart, but you may need a full

Questions about TTL and batch_mutate

2011-08-21 Thread Joris van der Wel
Hello, I have a ColumnFamily in which all columns are always set with a TTL, this would be one of the hottest column families (rows_cached is set to 1.0). I am wondering if TTL values also follow gc_grace? If they do, am I correct in thinking it would be best to set gc_grace really low in this

Re: Questions about TTL and batch_mutate

2011-08-21 Thread aaron morton
I am wondering if TTL values also follow gc_grace? They are purged by the first compaction that processes them after TTL has expired. The TTL expiry is used the same way as the expire on a Tombstone. Thinking out loud, is this possible…. t0 - write col to all 3 replicas. t1 - overwrite col

Re: Questions about TTL and batch_mutate

2011-08-21 Thread Joris van der Wel
On Sun, Aug 21, 2011 at 2:21 PM, aaron morton *@thelastpickle.com wrote:  I am wondering if TTL values also follow gc_grace? They are purged by the first compaction that processes them after TTL has expired. The TTL expiry is used the same way as the expire on a Tombstone. Thinking out

Re: Cluster key distribution wrong after upgrading to 0.8.4

2011-08-21 Thread Thibaut Britz
Hi, I will wait until this is fixed beforeI upgrade, just to be sure. Shall I open a new ticket for this issue? Thanks, Thibaut On Sun, Aug 21, 2011 at 11:57 AM, aaron morton aa...@thelastpickle.com wrote: This looks like an artifact of the way ownership is calculated for the OOP. See 

Commit log fills up in less than a minute

2011-08-21 Thread Anand Somani
Hi, 7.4, 3 node cluster, RF=3 Load has not changed much, on 2 of the 3 nodes the commit log filled up in less than a minute (did not give a chance to recover). Now have been running this cluster for abt 2-3 months without any problem. At this point I do not see any unusual load (continue to

Re: Commit log fills up in less than a minute

2011-08-21 Thread Anand Somani
So no it did not fill in a minute, but ton's of header files were written in a minute (is that normal, I assume these are marker files which get written when memtables are flushed. The actual data files have been around for the last 24 hours? Somehow this all seems connected to reintroduce node

Re: Commit log fills up in less than a minute

2011-08-21 Thread Peter Schuller
When does the actual commit-data file get deleted. The flush interval on all my memtables is 60 minutes They *should* be getting deleted when they no longer contain any data that has not been flushed to disk. Are flushes definitely still happening? Is it possible flushing has started failing

Re: Commit log fills up in less than a minute

2011-08-21 Thread Anand Somani
We have a lot of space on /data, and looks like it was flushing data fine from file timestamps. We did have a bit of goofup with IP's when bringing up a down node (and the commit files have been around since then). Wonder if that is what triggered it and we have a bunch of hinted handoff's being

RE: Completely removing a node from the cluster

2011-08-21 Thread Bryce Godfrey
Both .2 and .3 list the same from the mbean that Unreachable is empty collection, and Live node lists all 3 nodes still: 192.168.20.2 192.168.20.3 192.168.20.1 The removetoken was done a few days ago, and I believe the remove was done from .2 Here is what ring outlook looks like, not sure why

Re: Cluster key distribution wrong after upgrading to 0.8.4

2011-08-21 Thread aaron morton
I'm not sure what the fix is. When using an order preserving partitioner it's up to you to ensure the ring is correctly balanced. Say you have the following setup… node : token 1 : a 2 : h 3 : p If keys are always 1 character we can say each node own's roughly 33% of the ring. Because we

Re: Commit log fills up in less than a minute

2011-08-21 Thread aaron morton
Yup, you can check the what HH is doing via JMX. there is a bug in 0.7 that can result in log files not been deleted https://issues.apache.org/jira/browse/CASSANDRA-2829 Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On

Re: Completely removing a node from the cluster

2011-08-21 Thread aaron morton
I see the mistake I made about ring, gets the endpoint list from the same place but uses the token's to drive the whole process. I'm guessing here, don't have time to check all the code. But there is a 3 day timeout in the gossip system. Not sure if it applies in this case. Anyone know ?

would it possible for this kind of data loss?

2011-08-21 Thread Yan Chunlu
I was aware of the deleted items might be come back alive without proper node repair. how about modified items, for example 'A'={1,2,3}. then 'A'={4,5}. if that possible 'A' change back to {1,2,3}? I have encountered this mystery problem after go through a mess procedure with cassandra nodes,

Cassandra Cluster Admin - phpMyAdmin for Cassandra

2011-08-21 Thread SebWajam
Hi, I'm working on this project for a few months now and I think it's mature enough to post it here: https://github.com/sebgiroux/Cassandra-Cluster-Admin Cassandra Cluster Admin on GitHub Basically, it's a GUI for Cassandra. If you're like me and used MySQL for a while (and still using it!),

The schema has not settled in 10 seconds; further migrations are ill-advised until it does.?

2011-08-21 Thread Yan Chunlu
I have encountered this problem while update the key cache and row cache. I once updated them to 0(disable) while node2 was not available, when it comeback they eventually have the same schema version. [default@prjspace] describe cluster; Cluster Information: Snitch:

Re: Cassandra Cluster Admin - phpMyAdmin for Cassandra

2011-08-21 Thread Yan Chunlu
just tried it and it works like a charming! thanks a lot for the great work! On Mon, Aug 22, 2011 at 9:47 AM, SebWajam sebast...@wajam.com wrote: Hi, I'm working on this project for a few months now and I think it's mature enough to post it here: Cassandra Cluster Admin on

Re: The schema has not settled in 10 seconds; further migrations are ill-advised until it does.?

2011-08-21 Thread Yan Chunlu
thanks for the migration tip, but the schema is in agreement. [default@prjspace] describe cluster; Cluster Information: Snitch: org.apache.cassandra.locator.SimpleSnitch Partitioner: org.apache.cassandra.dht.RandomPartitioner Schema versions: 79d072cc-cc62-11e0-a753-5525ca993302:

Re: would it possible for this kind of data loss?

2011-08-21 Thread Stephane Legay
Ok, will look into it, thx for the heads up. Sent from a mobile device, please forgive typos. On Aug 21, 2011 6:45 PM, Yan Chunlu springri...@gmail.com wrote: I was aware of the deleted items might be come back alive without proper node repair. how about modified items, for example

RE: Completely removing a node from the cluster

2011-08-21 Thread Bryce Godfrey
It's been at least 4 days now. -Original Message- From: aaron morton [mailto:aa...@thelastpickle.com] Sent: Sunday, August 21, 2011 3:16 PM To: user@cassandra.apache.org Subject: Re: Completely removing a node from the cluster I see the mistake I made about ring, gets the endpoint list

Re: get mycf['rowkey']['column_name'] return 'Value was not found' in cassandra-cli

2011-08-21 Thread Jonathan Ellis
My guess: you're using an old version of the cli that isn't dealing with bytestype column names correctly On Mon, Aug 22, 2011 at 12:08 AM, Yan Chunlu springri...@gmail.com wrote: connect to cassandra-cli and issue the list my cf I got RowKey: comments_62559 = (column=76616c7565,

Avoid Simultaneous Minor Compactions?

2011-08-21 Thread Hefeng Yuan
We just noticed that at one time, 4 nodes were doing minor compaction together, each of them took 20~60 minutes. We're on 0.8.1, 6 nodes, RF5. This simultaneous compactions slowed down the whole cluster, we have local_quorum consistency level, therefore, dynamic_snitch is not helping us. Aside

Re: Avoid Simultaneous Minor Compactions?

2011-08-21 Thread Ryan King
You should throttle your compactions to a sustainable level. -ryan On Sun, Aug 21, 2011 at 10:22 PM, Hefeng Yuan hfy...@rhapsody.com wrote: We just noticed that at one time, 4 nodes were doing minor compaction together, each of them took 20~60 minutes. We're on 0.8.1, 6 nodes, RF5. This