Re: Cassandra freezes under load when using libc6 2.11.1-0ubuntu7.5

2011-01-17 Thread Erik Onnen
Unfortunately, the previous AMI we used to provision the 7.5 version is no longer available. More unfortunately, the two test nodes we spun up in each AZ did not get Nehalem architectures so the only things I can say for certain after running Mike's test 10x on each test node are: 1) I could not

Re: about the consistency level

2011-01-17 Thread aaron morton
The ConsistenyLevel is passed with each read and write command. How you set it will depend on the client you are using. Which one are you using ? Aaron On 17/01/2011, at 8:50 PM, raoyixuan (Shandy) wrote: How to set the consistency level in Cassandra 0.7? I mean what command?

RE: about the consistency level

2011-01-17 Thread raoyixuan (Shandy)
Both hector and cassandra-cli . Can you tell me respectively? Thanks a lot. From: aaron morton [mailto:aa...@thelastpickle.com] Sent: Monday, January 17, 2011 4:17 PM To: user@cassandra.apache.org Subject: Re: about the consistency level The ConsistenyLevel is passed with each read and write

Re: about the consistency level

2011-01-17 Thread aaron morton
The cassandra-clie works as CL.ONE , currently it cannot be changed. I'm not sure if there is a reason for this, but if it's a feature you would like add a request to JIRA https://issues.apache.org/jira/browse/CASSANDRA In Hector it's part of the m.p.h.api.Keyspace interface as

RE: about the consistency level

2011-01-17 Thread raoyixuan (Shandy)
Thanks a lot. From: aaron morton [mailto:aa...@thelastpickle.com] Sent: Monday, January 17, 2011 5:01 PM To: user@cassandra.apache.org Subject: Re: about the consistency level The cassandra-clie works as CL.ONE , currently it cannot be changed. I'm not sure if there is a reason for this, but if

Between Clause

2011-01-17 Thread kh jo
What is the best way to model a query with between clause.. given that you have a large number of entries... thanks Jo

Re: Between Clause

2011-01-17 Thread aaron morton
Can you provide some more information ? Aaron On 17/01/2011, at 11:55 PM, kh jo wrote: What is the best way to model a query with between clause.. given that you have a large number of entries... thanks Jo

Re: Between Clause

2011-01-17 Thread Donal Zang
On 17/01/2011 11:55, kh jo wrote: What is the best way to model a query with between clause.. given that you have a large number of entries... thanks Jo In my experience,for the row based 'between clause' with a random partition, you should design the column family carefully, So that you

Re: Between Clause

2011-01-17 Thread kh jo
another example:  generating visit statistics given that start and end date are dynamic --- On Mon, 1/17/11, kh jo jo80la...@yahoo.com wrote: From: kh jo jo80la...@yahoo.com Subject: Re: Between Clause To: user@cassandra.apache.org Date: Monday, January 17, 2011, 12:40 PM example: finding

Re: balancing load

2011-01-17 Thread Edward Capriolo
On Mon, Jan 17, 2011 at 2:44 AM, aaron morton aa...@thelastpickle.com wrote: The nodes will not automatically delete stale data, to do that you need to run nodetool cleanup. See step 3 in the Range Changes Bootstrap http://wiki.apache.org/cassandra/Operations#Range_changes If you are

Re: Cassandra-Maven-Plugin

2011-01-17 Thread Stephen Connolly
https://issues.apache.org/jira/browse/CASSANDRA-1997 On 16 January 2011 19:59, Stephen Connolly stephen.alan.conno...@gmail.com wrote: it will be an attachment to an as yet un raised jira. look out for it tomorrow/tuesday - Stephen --- Sent from my Android phone, so random spelling

quorum calculation seems to depend on previous selected nodes

2011-01-17 Thread Samuel Benz
Dear List I found a strange behavior on our mini cluster during update with consistency level quorum. We have a cluster with 4 nodes. ReplicationFactor is 2, ReplicaPlacment is the RackAwareStrategy and the EndpointSnitch is the PropertyFileEndpointSnitch (with two data center and two racks

Re: balancing load

2011-01-17 Thread Edward Capriolo
On Mon, Jan 17, 2011 at 10:51 AM, Peter Schuller peter.schul...@infidyne.com wrote: Just to head the next possible problem. If you run 'nodetool cleanup' on each node and some of your nodes still have more data then others, then it probably means your are writing the majority of data to a few

Re: quorum calculation seems to depend on previous selected nodes

2011-01-17 Thread Jonathan Ellis
On Mon, Jan 17, 2011 at 9:55 AM, Samuel Benz samuel.b...@switch.ch wrote: We have a cluster with 4 nodes. ReplicationFactor is 2, ReplicaPlacment is the RackAwareStrategy and the EndpointSnitch is the PropertyFileEndpointSnitch (with two data center and two racks each). My understanding is,

Cassandra GC Settings

2011-01-17 Thread Dan Hendry
I am having some reliability problems in my Cassandra cluster which I am almost certain is due to GC. I was about to start delving into the guts of the problem by turning on GC logging but I have never done any serious java GC tuning before (time to learn I guess). As a first step however, I was

Re: balancing load

2011-01-17 Thread Peter Schuller
@Peter Isn't clean up a special case of compaction? IE it works as a major compaction + removes data not belonging to the node? Yes, sorry. Brain lapse. Ignore my. -- / Peter Schuller

Re: Cassandra GC Settings

2011-01-17 Thread SriSatish Ambati
Dan, Please kindly attach your: 1) java -version 2) full commandline settings, heap sizes. 3) gc log from one of the nodes via: -XX:+PrintTenuringDistribution \ -XX:+PrintGCDetails \ -XX:+PrintGCTimeStamps \ -Xloggc:/var/log/cassandra/gc.log \ 4) number of cores on your system. How busy is the

Re: Cassandra GC Settings

2011-01-17 Thread Peter Schuller
very quickly from the young generation to the old generation. Furthermore, the CMSInitiatingOccupancyFraction of 75 (from a JVM default of 68) means start gc in the old generation later, presumably to allow Cassandra to use more of the old generation heap without needlessly trying to free up

Re: balancing load

2011-01-17 Thread Karl Hiramoto
On 01/17/11 15:54, Edward Capriolo wrote: Just to head the next possible problem. If you run 'nodetool cleanup' on each node and some of your nodes still have more data then others, then it probably means your are writing the majority of data to a few keys. ( you probably do not want to do

Re: balancing load

2011-01-17 Thread Edward Capriolo
On Mon, Jan 17, 2011 at 1:20 PM, Karl Hiramoto k...@hiramoto.org wrote: On 01/17/11 15:54, Edward Capriolo wrote: Just to head the next possible problem. If you run 'nodetool cleanup' on each node and some of your nodes still have more data then others, then it probably means your are writing

Re: quorum calculation seems to depend on previous selected nodes

2011-01-17 Thread Jonathan Ellis
Adding CL.TWO would be easy enough. :) On Mon, Jan 17, 2011 at 12:12 PM, Peter Schuller peter.schul...@infidyne.com wrote: I think you should just tell everybody that if you want to use QUORUM you need RF = 3 for it to be meaningful. No one would use QUORUM with RF 3 except in error. Well,

Re: quorum calculation seems to depend on previous selected nodes

2011-01-17 Thread Peter Schuller
Adding CL.TWO would be easy enough. :) True, but the obvious generalization is to be able to select an arbitrary replica count and that seemed like a bigger change to the API. But if CL.TWO would be considered clean enough... I may submit a jira/patch. -- / Peter Schuller

Super CF or two CFs?

2011-01-17 Thread Steven Mac
How can I best map an object containing two maps, one of which is updated very frequently and the other only occasionally? a) As one super CF, which each map in a separate supercolumn and the map entries being the subcolumns? b) As two CFs, one for each map. I'd like to discuss the why behind

Re: about the consistency level

2011-01-17 Thread Aaron Morton
Have you added a Jira for this? Or does anyone else want or not want this feature ? I'll try to add it as practice. Aaron On 17/01/2011, at 10:15 PM, raoyixuan (Shandy) raoyix...@huawei.com wrote: Thanks a lot. From: aaron morton [mailto:aa...@thelastpickle.com] Sent: Monday, January 17,

Re: quorum calculation seems to depend on previous selected nodes

2011-01-17 Thread Jonathan Ellis
On Mon, Jan 17, 2011 at 2:10 PM, Samuel Benz samuel.b...@switch.ch wrote: Case1: If 'TEST' was previous stored on Node1, Node2, Node3 - The update will succeed. Case2: If 'TEST' was previous stored on Node2, Node3, Node4 - The update will not work. If you have RF=2 then it will be stored

Re: Cassandra GC Settings

2011-01-17 Thread Dan Hendry
Thanks for all the info, I think I have been able to sort out my issue. The new settings I am using are: -Xmn512M (Very important I think) -XX:SurvivorRatio=5 (Not very important I think) -XX:MaxTenuringThreshold=5 -XX:ParallelGCThreads=8 -XX:CMSInitiatingOccupancyFraction=75 Since applying

Re: Do you have a site in production environment with Cassandra? What client do you use?

2011-01-17 Thread Colin Vipurs
Java + Pelops On Sat, Jan 15, 2011 at 10:58 PM, Dave Viner davevi...@gmail.com wrote: Perl using the thrift interface directly. On Sat, Jan 15, 2011 at 6:10 AM, Daniel Lundin d...@eintr.org wrote: python + pycassa scala + Hector On Fri, Jan 14, 2011 at 6:24 PM, Ertio Lew ertio...@gmail.com

Re: Cassandra GC Settings

2011-01-17 Thread SriSatish Ambati
Thanks, Dan: Yes, -Xmn512MB/1G sizes the Young Generation explicitly and removes the adaptive resizing out of the picture. (If at all possible send your gc log over we can analyze the promotion failure a little bit more finely.) The low load implies that that you are able to use the parallel

Re: Cassandra GC Settings

2011-01-17 Thread Peter Schuller
Now, a full stop of the application was what I was seeing extensively before (100-200 times over the course of a major compaction as reported by gossipers on other nodes). I have also just noticed that the previous instability (ie application stops) correlated with the compaction of a few

RE: Super CF or two CFs?

2011-01-17 Thread Steven Mac
Sure, consider stock data, where the stock symbol is the row key. The stock data consists of a rather stable part and a very volatile part, both of which would be a super column. The stable super column would contain subcolumns such as company name, address, and some annual or quarterly data.

Re: Super CF or two CFs?

2011-01-17 Thread Stephen Connolly
On 17 January 2011 22:36, Steven Mac ugs...@hotmail.com wrote: Sure, consider stock data, where the stock symbol is the row key. The stock data consists of a rather stable part and a very volatile part, both of which would be a super column. The stable super column would contain subcolumns

RE: Super CF or two CFs?

2011-01-17 Thread Steven Mac
I guess I was maybe trying to simplify the question too much. In reality I do not have one volatile part, but multiple ones (say all trading data of day). Each would be a supercolumn identified by the time slot, with the individual fields as subcolumns. Of course, I could prefix the time slot

Re: Super CF or two CFs?

2011-01-17 Thread Brandon Williams
On Mon, Jan 17, 2011 at 5:12 PM, Steven Mac ugs...@hotmail.com wrote: I guess I was maybe trying to simplify the question too much. In reality I do not have one volatile part, but multiple ones (say all trading data of day). Each would be a supercolumn identified by the time slot, with the

please help with multiget

2011-01-17 Thread Shu Zhang
Here's the method declaration for quick reference: mapstring,listColumnOrSuperColumn multiget_slice(string keyspace, liststring keys, ColumnParent column_parent, SlicePredicate predicate, ConsistencyLevel consistency_level) It looks like you must have the same SlicePredicate for every key in

Re: please help with multiget

2011-01-17 Thread Aaron Morton
If you can provide some more information on a specific use case we may be able to help with the modelling.The general approach is to denormalise the data to the point where each request/activity/feature in your application results in a call to get data from one or more rows in one CF. It's not

Re: What is be the best possible client option available to a PHP developer for implementing an application ready for production environments ?

2011-01-17 Thread Brandon Williams
On Mon, Jan 17, 2011 at 7:22 PM, Ertio Lew ertio...@gmail.com wrote: What would be the best client option to go with in order to use Pycassa. https://github.com/thobbs/pycassa Cassandra through an application to be implemented in PHP. Oh. Then https://github.com/thobbs/phpcassa

Re: What is be the best possible client option available to a PHP developer for implementing an application ready for production environments ?

2011-01-17 Thread Rajkumar Gupta
Hey Brandon, 1. ) Is it devloped to the level in order to support all the necessary features to take full advantage of Cassandra? 2. ) Is it used in production by anyone ? 3. ) What are its limitations? Thanks. On Tue, Jan 18, 2011 at 7:11 AM, Brandon Williams dri...@gmail.com wrote: On

Re: quorum calculation seems to depend on previous selected nodes

2011-01-17 Thread Samuel Benz
On 01/17/2011 09:28 PM, Jonathan Ellis wrote: On Mon, Jan 17, 2011 at 2:10 PM, Samuel Benz samuel.b...@switch.ch wrote: Case1: If 'TEST' was previous stored on Node1, Node2, Node3 - The update will succeed. Case2: If 'TEST' was previous stored on Node2, Node3, Node4 - The update will not

Re: Tombstone lifespan after multiple deletions

2011-01-17 Thread Ryan King
On Sun, Jan 16, 2011 at 6:53 AM, David Boxenhorn da...@lookin2.com wrote: If I delete a row, and later on delete it again, before GCGraceSeconds has elapsed, does the tombstone live longer? Each delete is a new tombstone, which should answer your question. -ryan In other words, if I have the