Re: memory question

2010-03-25 Thread Jonathan Ellis
Cassandra mmaps your data files which show up as RES and SHR. This is normal. c0d1p1 is completely maxed out. Assuming that is your data disk and not your commitlog one, you need to tell Cassandra to cache more rows (or keys, depending). If you are maxing out your caches and still seeing this t

Re: Range scan performance in 0.6.0 beta2

2010-03-25 Thread Jonathan Ellis
On Thu, Mar 25, 2010 at 8:33 AM, Henrik Schröder wrote: > Hi everyone, > > We're trying to implement a virtual datastore for our users where they can > set up "tables" and "indexes" to store objects and have them indexed on > arbitrary properties. And we did a test implementation for Cassandra in

RE: Ring management and load balance

2010-03-25 Thread Stu Hood
It is much more likely that you always increase your cluster in size by a certain large percentage. With a 10 node cluster, you are likely to add 5 nodes at a time, and with a 100 node cluster you'll probably add 25 to 50 per batch. -Original Message- From: "Daniel Kluesing" Sent: Thur

RE: Ring management and load balance

2010-03-25 Thread Daniel Kluesing
I agree it's only a problem with 'small' clusters - but it seems like 'small' is 'most users'? Even with 10 nodes it looks like a pretty big imbalance if I add an 11th node, and don't add the other 9 or move a large part of the ring. Or in practice have folks not had trouble with incremental sca

Re: Auto Increament

2010-03-25 Thread Ryan King
On Thu, Mar 25, 2010 at 5:57 AM, Jaepil Jeong wrote: > Hi there, > Thanks to all for reply. But I still have a question: > 1. When I using Twtiter via Tweetie which is iPhone application, I can see > the unique ID for each of users in their personal profile page. It seems > like incremental number

Re: Deleting and re-inserting row causes error in get_slice count parameter

2010-03-25 Thread Bob Florian
Sure thing. Here it is: https://issues.apache.org/jira/browse/CASSANDRA-920 On Thu, Mar 25, 2010 at 4:44 PM, Jonathan Ellis wrote: > Can you create a ticket with a test case? > > On Thu, Mar 25, 2010 at 3:39 PM, Bob Florian wrote: >> I was originally using 0.5.0 but I've reproduced the behavior

Re: memory question

2010-03-25 Thread B. Todd Burruss
no compaction. Jonathan Ellis wrote: did you check jmx to see if a compaction is going on? On Mon, Mar 22, 2010 at 5:14 PM, Todd Burruss wrote: after running my cluster for a while performance has become unacceptable, 200+ ms for reads. if running well, i see reads <10ms. when i run iost

Cassandra Hackathon in SF @ Digg - 04/22 6:30pm

2010-03-25 Thread Chris Goffinet
As promised, here is the official invite to register for the hackathon in SF. The event starts at 6:30pm on April 22nd. http://cassandrahackathon.eventbrite.com/ -- Chris Goffinet

Re: Ring management and load balance

2010-03-25 Thread Jonathan Ellis
One problem is if the heaviest node is next to a node that's is lighter than average, instead of heavier. Then if the new node takes extra from the heaviest, say 75% instead of just 1/2, and then we take 1/2 of the heaviest's neighbor and put it on the heaviest, you made that lighter-than-average

Re: Deleting and re-inserting row causes error in get_slice count parameter

2010-03-25 Thread Jonathan Ellis
Can you create a ticket with a test case? On Thu, Mar 25, 2010 at 3:39 PM, Bob Florian wrote: > I was originally using 0.5.0 but I've reproduced the behavior with > 0.5.1 and 0.6.0-beta3. > > > On Wed, Mar 24, 2010 at 3:00 PM, Jonathan Ellis wrote: >> Are you using 0.5.0?  Because this sounds li

Re: Deleting and re-inserting row causes error in get_slice count parameter

2010-03-25 Thread Bob Florian
I was originally using 0.5.0 but I've reproduced the behavior with 0.5.1 and 0.6.0-beta3. On Wed, Mar 24, 2010 at 3:00 PM, Jonathan Ellis wrote: > Are you using 0.5.0?  Because this sounds like a bug that was fixed in 0.5.1. > > On Mon, Mar 22, 2010 at 5:13 PM, Bob Florian wrote: >> I'm new to

Re: Ring management and load balance

2010-03-25 Thread Jeremy Dunck
On Thu, Mar 25, 2010 at 1:26 PM, Jonathan Ellis wrote: > Pretty much everything assumes that there is a 1:1 correspondence > between IP and Token.  It's probably in the ballpark of "one month to > code, two to get the bugs out."  Gossip is one of the trickier parts > of our code base, and this wou

Re: how to delete data

2010-03-25 Thread Jonathan Ellis
Commented on the Jira issue. Curious how badly out of date that patch is now. :) On Wed, Mar 24, 2010 at 12:55 PM, Ran Tavory wrote: > I'm willing to give it a try. > Where do I start, except for applying the patch in the bug? > > On Wed, Mar 24, 2010 at 2:30 PM, Jonathan Ellis wrote: >> >> Cur

Re: Load balancing

2010-03-25 Thread Jeremy Dunck
On Thu, Mar 25, 2010 at 1:20 PM, Y Aw wrote: > Hi all, > I have a question about load-balancing. http://wiki.apache.org/cassandra/FAQ#node_clients_connect_to Does that help?

Re: Ring management and load balance

2010-03-25 Thread Jonathan Ellis
On Thu, Mar 25, 2010 at 1:17 PM, Mike Malone wrote: > On Thu, Mar 25, 2010 at 9:56 AM, Jonathan Ellis wrote: >> >> The advantage to doing it the way Cassandra does is that you can keep >> keys sorted with OrderPreservingPartitioner for range scans.  grabbing >> one token of many from each node in

Load balancing

2010-03-25 Thread Y Aw
Hi all, I have a question about load-balancing. I have easily built a cluster with two nodes, but I am wondering how my client should connect to this cluster. - Run queries against one node (but all data will transit through this node and this way creates a SPOF) - Run queries against an external

Re: Ring management and load balance

2010-03-25 Thread Mike Malone
On Thu, Mar 25, 2010 at 9:56 AM, Jonathan Ellis wrote: > The advantage to doing it the way Cassandra does is that you can keep > keys sorted with OrderPreservingPartitioner for range scans. grabbing > one token of many from each node in the ring would prohibit that. > > So we rely on active load

Re: Auto Increament

2010-03-25 Thread Tatu Saloranta
On Thu, Mar 25, 2010 at 9:20 AM, Benjamin Black wrote: > Cassandra is not being used to generate the Twitter identifiers. > Twitter, like most places using Cassandra, has more than one database > system in production. > > UUIDs are not at risk of conflicts with billions of rows. Exactly: UUIDs we

Re: Model Question

2010-03-25 Thread Benjamin Black
Erez, To make this work you have to make your model fit Cassandra, not the other way around. As a rule, you either do complex queries via client code to process the results of several, simpler queries or via a CF you create to act as an index. Yes, this means you have to write data to each index

Re: Ring management and load balance

2010-03-25 Thread Jonathan Ellis
On Thu, Mar 25, 2010 at 11:40 AM, Jeremy Dunck wrote: > On Thu, Mar 25, 2010 at 10:56 AM, Jonathan Ellis wrote: >> The advantage to doing it the way Cassandra does is that you can keep >> keys sorted with OrderPreservingPartitioner for range scans.  grabbing >> one token of many from each node in

Re: Range scan performance in 0.6.0 beta2

2010-03-25 Thread Sylvain Lebresne
On Thu, Mar 25, 2010 at 5:31 PM, Henrik Schröder wrote: > On Thu, Mar 25, 2010 at 15:17, Sylvain Lebresne wrote: >> >> I don't know If that could play any role, but if ever you have >> disabled the assertions >> when running cassandra (that is, you removed the -ea line in >> cassandra.in.sh), the

Re: Model Question

2010-03-25 Thread Peter Chang
Do you mean on the client? It really depends on how many items you're sorting. In terms of computer runtime, client-side will always likely be faster but if you take into account bandwidth speeds having a pre-sorted list will be better for large lists. Creating 0-padded numbers is pretty straightf

Re: Ring management and load balance

2010-03-25 Thread Jeremy Dunck
On Thu, Mar 25, 2010 at 10:56 AM, Jonathan Ellis wrote: > The advantage to doing it the way Cassandra does is that you can keep > keys sorted with OrderPreservingPartitioner for range scans.  grabbing > one token of many from each node in the ring would prohibit that. > > So we rely on active load

Re: Range scan performance in 0.6.0 beta2

2010-03-25 Thread Nathan McCall
I noticed you turned Key caching off in your ColumnFamily declaration, have you tried experimenting with this on and playing key caching configuration? Also, have you looked at the JMX output for what commands are pending execution? That is always helpful to me in hunting down bottlenecks. -Nate

Re: Range scan performance in 0.6.0 beta2

2010-03-25 Thread Henrik Schröder
On Thu, Mar 25, 2010 at 15:17, Sylvain Lebresne wrote: > I don't know If that could play any role, but if ever you have > disabled the assertions > when running cassandra (that is, you removed the -ea line in > cassandra.in.sh), there > was a bug in 0.6beta2 that will make read in row with lots o

Re: Auto Increament

2010-03-25 Thread Benjamin Black
Cassandra is not being used to generate the Twitter identifiers. Twitter, like most places using Cassandra, has more than one database system in production. UUIDs are not at risk of conflicts with billions of rows. b On Thu, Mar 25, 2010 at 5:57 AM, Jaepil Jeong wrote: > Hi there, > Thanks to

Re: Using cassandra as key /value store

2010-03-25 Thread Jonathan Ellis
Cassandra gives you a superset of simple key/value, so why not? reddit is using Cassandra like this, fwiw. On Thu, Mar 25, 2010 at 10:55 AM, Anurag Gujral wrote: > Hi All, >   I am  designing an application where I  need to store data as > key-value pair without the present need to use c

Re: Ring management and load balance

2010-03-25 Thread Jonathan Ellis
The advantage to doing it the way Cassandra does is that you can keep keys sorted with OrderPreservingPartitioner for range scans. grabbing one token of many from each node in the ring would prohibit that. So we rely on active load balancing to get to a "good enough" balance, say within 50%. It

Using cassandra as key /value store

2010-03-25 Thread Anurag Gujral
Hi All, I am designing an application where I need to store data as key-value pair without the present need to use column/super-column family stuff. Does my use case fits Cassandra. My traffic will be 70-80% read traffic.The latency requirements are 100ms. Thanks Anurag

Ring management and load balance

2010-03-25 Thread Daniel Kluesing
I wanted to check my understanding of the load balance operation. Let's say I have 5 nodes, each of them has been assigned at startup 1/5 of the ring, and the load is equal across them (say using random partitioner). The load on the cluster gets high, so I add a sixth server. During bootstrap, t

Re: Separate disks with cloud deployment

2010-03-25 Thread Jonathan Ellis
If you have enough data or insert volume that you can reasonably use dedicated hardware, you should probably use that. (http://spyced.blogspot.com/2010/03/why-your-data-may-not-belong-in-cloud.html) If you don't, then having CL + data on the same volume isn't going to hurt nearly as much as sharin

Re: Separate disks with cloud deployment

2010-03-25 Thread Ethan Rowe
On 03/25/2010 11:18 AM, Ethan Rowe wrote: [snip] I'll defer to the Rackspace folks regarding Rackspace Cloud; it has been I/O on average since you're dealing with a real, local disk. But I don't know about getting a second disk in that environment, though. That should have said "better I/O o

Can we do a more precise query in Cassandra ?

2010-03-25 Thread 郭鹏
Hi All: I am thinking a more precise query in Cassandra: Could we hava a query API like this : List> get_slice_condition(String keyspace, List keys, ColumnParent column_parent, Map queryConditions, int consistency_level) So we could use this API to query more precise data like age column's valu

Re: Separate disks with cloud deployment

2010-03-25 Thread Ethan Rowe
On 03/25/2010 11:10 AM, Mark Greene wrote: The FAQ page makes mention of using separate disks for the commit log and data directory. How would one go about achieving this in a cloud deployment such as Rackspace cloud servers or EC2 EBS? Or is it just preferred to use dedicated hardware to get t

Separate disks with cloud deployment

2010-03-25 Thread Mark Greene
The FAQ page makes mention of using separate disks for the commit log and data directory. How would one go about achieving this in a cloud deployment such as Rackspace cloud servers or EC2 EBS? Or is it just preferred to use dedicated hardware to get the optimal performance? Thanks In Advance! Be

Re: Range scan performance in 0.6.0 beta2

2010-03-25 Thread Sylvain Lebresne
I don't know If that could play any role, but if ever you have disabled the assertions when running cassandra (that is, you removed the -ea line in cassandra.in.sh), there was a bug in 0.6beta2 that will make read in row with lots of columns quite slow. Another problem you may have is if you have

Range scan performance in 0.6.0 beta2

2010-03-25 Thread Henrik Schröder
Hi everyone, We're trying to implement a virtual datastore for our users where they can set up "tables" and "indexes" to store objects and have them indexed on arbitrary properties. And we did a test implementation for Cassandra in the following way: Objects are stored in one columnfamily, each k

Re: Auto Increament

2010-03-25 Thread Jaepil Jeong
Hi there, Thanks to all for reply. But I still have a question: 1. When I using Twtiter via Tweetie which is iPhone application, I can see the unique ID for each of users in their personal profile page. It seems like incremental number. As far as I know, Twitter using Cassandra for its back-end.

Re: Model Question

2010-03-25 Thread Erez Efrati
You are correct Chris. I am a newbie too in this field. I like the Cassandra/NoSQL way and I am trying to see if it can fit my model. Thanks, Erez On Thu, Mar 25, 2010 at 11:03 AM, Christopher Brind < christopher.br...@googlemail.com> wrote: > Hi, > > I wondered if you were eluding to something

Re: Model Question

2010-03-25 Thread Christopher Brind
Hi, I wondered if you were eluding to something more complex. You'd probably want to create a index using something along the lines that Peter suggested. :) But I'm a Cassandra / Column DB newbie, so my experience ends just about ... here. :) Cheers, Chris On 25 March 2010 08:59, Erez Efrati

Re: Model Question

2010-03-25 Thread Erez Efrati
I am not clear how does this work when I want to increase the count of user-1. Thanks Erez On Thu, Mar 25, 2010 at 12:57 AM, Peter Chang wrote: > If there's not much overhead, I recommend client side as well. > > Otherwise, you can only sort on column. Therefore, you could create some > sort of

Re: Model Question

2010-03-25 Thread Erez Efrati
Hi Chris, So, if I get it right, you suggest that I pull all the columns for in a single row and do the sorting client side? The user-friends-messages was just an example and maybe not the best I could come up with cause I agree that there are not too many friends in general that send you messages

Re: Model Question

2010-03-25 Thread Colin Vipurs
Peter, Do you think 0-padding the entries would be more efficient than just implementing your own comparator? On Wed, Mar 24, 2010 at 10:57 PM, Peter Chang wrote: > If there's not much overhead, I recommend client side as well. > Otherwise, you can only sort on column. Therefore, you could creat

Re: Generated code for csharp thrift interface for Cassandra

2010-03-25 Thread Raymond Wilson
I agree, producing a Delphi generator would be preferable, and may be the end result in any case. There are some issues with this, such as the fact the compiled generator program apparently does not run on Windows (is there a prebuilt version of this that can be downloaded)? Not having generics