replacing a dead node

2010-10-11 Thread Chen Xinli
Hi, We have a cassandra cluster of 6 nodes with RF=3, read-repair enabled, hinted handoff disabled, WRITE with QUORUM, READ with ONE. we want to rely on read-repair totally for node failure, as returning inconsistent result temporarily is ok for us. If a node is temporarily dead and returneded

Cassandra newbie question

2010-10-11 Thread Arijit Mukherjee
Hi All I've just started reading about Cassandra and writing simple tests using Cassandra 0.6.5 to see if we can use it for our product. I have a data store with a set of columns, like C1, C2, C3, and C4, but the columns aren't mandatory. For example, there can be a list of (k.v) pairs with only

Re: Cassandra newbie question

2010-10-11 Thread Arijit Mukherjee
Just a follow on question to this - would PIG be a good fit for such questions? Arijit On 11 October 2010 14:31, Arijit Mukherjee ariji...@gmail.com wrote: Hi All I've just started reading about Cassandra and writing simple tests using Cassandra 0.6.5 to see if we can use it for our product.

Re: Retaining commit logs

2010-10-11 Thread Oleg Anastasyev
Matthew Dennis mdennis at riptano.com writes: Yes, please file it to Jira.  It seems like it would be pretty useful for various things and fairly easy to change the code to move it to another directory whenever C* thinks it should be deleted... Here it is for 0.6.4 version. Should work on a

Re: replacing a dead node

2010-10-11 Thread Gary Dusbabek
On Mon, Oct 11, 2010 at 03:41, Chen Xinli chen.d...@gmail.com wrote: Hi, We have a cassandra cluster of 6 nodes with RF=3, read-repair enabled, hinted handoff disabled, WRITE with QUORUM, READ with ONE. we want to rely on read-repair totally for node failure, as returning inconsistent result

Re: Cassandra newbie question

2010-10-11 Thread Gary Dusbabek
On Mon, Oct 11, 2010 at 04:01, Arijit Mukherjee ariji...@gmail.com wrote: Hi All I've just started reading about Cassandra and writing simple tests using Cassandra 0.6.5 to see if we can use it for our product. I have a data store with a set of columns, like C1, C2, C3, and C4, but the

Wide rows or tons of rows?

2010-10-11 Thread Héctor Izquierdo Seliva
Hi everyone. I'm sure this question or similar has come up before, but I can't find a clear answer. I have to store a unknown number of items in cassandra, which can vary from a few hundreds to a few millions per customer. I read that in cassandra wide rows are better than a lot of rows, but

Re: Problem Starting Cassandra

2010-10-11 Thread Eric Evans
On Fri, 2010-10-08 at 16:34 -0500, Michael Shuler wrote: This looks like you haven't set up the system to use the Sun JRE, yet. Debian/Ubuntu uses CGJ by default. OpenJDK works fine as well (package openjdk-6-jre). -- Eric Evans eev...@rackspace.com

Re: Nodes getting slowed down after a few days of smooth operation

2010-10-11 Thread Brandon Williams
On Mon, Oct 11, 2010 at 9:13 AM, Ran Tavory ran...@gmail.com wrote: In my production cluster I've been seeing the following pattern. When a node goes up it operates smoothly for a few days but then, after a few days the node start to show excessive CPU usage, I see GC activity (and it may

Re: Wide rows or tons of rows?

2010-10-11 Thread Edward Capriolo
2010/10/11 Héctor Izquierdo Seliva izquie...@strands.com: Hi everyone. I'm sure this question or similar has come up before, but I can't find a clear answer. I have to store a unknown number of items in cassandra, which can vary from a few hundreds to a few millions per customer. I read

Re: Multi Data Center Strategy

2010-10-11 Thread Edward Capriolo
On Mon, Oct 11, 2010 at 9:53 AM, Henry Luo h...@choicestream.com wrote: We have an application that does a lot of updates to the rows. We use replication factor of 3 and are moving to multiple data centers. We would like to accomplish the following setup: Data are replicated to other data

Re: Wide rows or tons of rows?

2010-10-11 Thread Héctor Izquierdo Seliva
El lun, 11-10-2010 a las 11:08 -0400, Edward Capriolo escribió: Inlined: 2010/10/11 Héctor Izquierdo Seliva izquie...@strands.com: Hi everyone. I'm sure this question or similar has come up before, but I can't find a clear answer. I have to store a unknown number of items in cassandra,

Re: Nodes getting slowed down after a few days of smooth operation

2010-10-11 Thread Robert Coli
On 10/11/10 7:13 AM, Ran Tavory wrote: After a node gets restarted it compacts the sstable files on disk. I'm not sure whether compactions always take place after restart, maybe it's just minor compactions, I'm a little confused here, but my story would work best if (major) compactions were

Re: Nodes getting slowed down after a few days of smooth operation

2010-10-11 Thread Peter Schuller
I have wondered before whether there is any technical reason why the commit log replay should end with a flush, and from what I can tell, there isn't one other than the general goal of not having a large commit log. My personal feeling is that the last thing you want your production node doing

Re: Nodes getting slowed down after a few days of smooth operation

2010-10-11 Thread Peter Schuller
170141183460469231731687303715884105727 192.168.252.88Up         10.07 GB Firstly, I second the point raised about the row cache size (very frequent concurrent GC:s is definitely an indicator that the JVM heap size is too small, and the row cache seems like a likely contender - especially given

Re: Wide rows or tons of rows?

2010-10-11 Thread Jeremy Davis
Thanks for this reply. I'm wondering about the same issue... Should I bucket things into Wide rows (say 10M rows), or narrow (say 10K or 100K).. Of course it depends on my access patterns right... Does anyone know if a partial row cache is a feasible feature to implement? My use case is something

Re: Wide rows or tons of rows?

2010-10-11 Thread Aaron Morton
No idea about a partial row cache, but I would start with fat rows in your use case. If you find that performance is really a problem then you could add a second "recent / oldest" CF that you maintain with the most recent entries and use the row cache there. OR add more nodes.AaronOn 12 Oct,

Re: Nodes getting slowed down after a few days of smooth operation

2010-10-11 Thread Ran Tavory
Thanks Peter, Robert and Brandon. So it seems that the only suspect by now is my excessive caching ;) I'll get a better look at the GC activity next time shit starts to happen, but in the mean time, as for the cache size (cassandra's internal cache), it's row cache capacity is set to 10,000,000. I

Re: Nodes getting slowed down after a few days of smooth operation

2010-10-11 Thread Peter Schuller
My motivation was that since I don't have too much data (10G each node) then why don't I cache the hell out of it, so I started with a cache size of 100% and a much larger heap size (started with 12G out of the 16G ram). Over time I've learned that too much heap for the JVM is like a kid in a

Understanding Range queries with Random Partition

2010-10-11 Thread Rana Aich
Hi, I've used range queries for Order Preserving Partition and got the satisfactory results. For instance, I can find first 1 million keys that starts with key '2008010100' and ends with '2008010200'. Now I'm trying to do the same with Random Partitioning. But here I find that for Range

Re: Nodes getting slowed down after a few days of smooth operation

2010-10-11 Thread Ran Tavory
Peter, you're my JVM GC hero! Thank you! On Tue, Oct 12, 2010 at 12:38 AM, Peter Schuller peter.schul...@infidyne.com wrote: My motivation was that since I don't have too much data (10G each node) then why don't I cache the hell out of it, so I started with a cache size of 100% and a

Exception in the tool

2010-10-11 Thread Dmitri Smirnov
Is below a normal thing? I am a newby, just unpacked and started a single node. $ bin/nodetool -h localhost -p 8080 version ReleaseVersion: 0.7.0-beta2 $ bin/nodetool -h localhost -p 8080 tpstats Pool NameActive Pending Completed MIGRATION_STAGE 0

Re: Exception in the tool

2010-10-11 Thread Aaron Morton
Sounds like your are getting this problem...http://www.mail-archive.com/user@cassandra.apache.org/msg06295.htmlShould be fixed in the nightly build. You can still get the stats via JConsole.AaronOn 12 Oct, 2010,at 01:14 PM, Dmitri Smirnov dsmir...@netflix.com wrote:Is below a normal thing? I am a

Re: getSchemaVersion

2010-10-11 Thread Jonathan Ellis
On Mon, Oct 11, 2010 at 7:53 PM, B. Todd Burruss bburr...@real.com wrote:  to determine if my programmatic schema changes have been distributed throughout the cluster, I am supposed to use getSchemaVersionMap, correct? my question is how do I properly use it?  I have the schema version

Re: getSchemaVersion

2010-10-11 Thread Jonathan Ellis
On Mon, Oct 11, 2010 at 9:48 PM, B. Todd Burruss bburr...@real.com wrote: i was actually doing this to start with and was worried that i could have two clients modifying schemas at the same time.  it seems this could cause multiple valid versions and a race condition.  maybe it simply works out