Re: how to decommission two slow nodes?

2010-05-20 Thread Ran Tavory
I forgot to mention that the cluster cannot be taken down, it needs to continue serving... is there another way? On May 21, 2010 3:03 AM, "Jonathan Ellis" wrote: One possibility: rsync the data to the next node in the ring that is in the same DC. (specifically, rsync once, then flush on the sou

New Changes in Cass 0.7 Thrift API Interface

2010-05-20 Thread Arya Goudarzi
Hi Fellows, I just joined this mailing list but I've been on the IRC for a while. Pardon if this post is a repeat but I would like to share with you some of my experiences with Cassandra Thrift Interface that comes with the nightly built and probably 0.7. I came across an issue last night that

Re: What happened if one server involved in the process of data reading fail?

2010-05-20 Thread 史英杰
What inner mechanism does Cassandra adopt to get this kind of fault tolerance? 2010/5/20 Simon Smith > On Thu, May 20, 2010 at 8:08 AM, 史英杰 wrote: > > Hi, All, > > I am now learning the mechanism Cassandra adopts to get high > > availability and fault tolerance. As I know, we should connec

Re: Scaling problems

2010-05-20 Thread Ian Soboroff
Excellent leads, thanks. cassandra.in.sh has a heap of 6GB, but I didn't realize that I was trying to float so many memtables. I'll poke tomorrow and report if it gets fixed. Ian On Thu, May 20, 2010 at 10:40 AM, Jonathan Ellis wrote: > Some possibilities: > > You didn't adjust Cassandra heap

Re: Ring out of sync, cassandra_UnavailableException being thrown

2010-05-20 Thread Jonathan Ellis
Were you bootstrapping or otherwise moving nodes around? I don't think anyone's tracked this bug down farther than "if you restart the entire cluster, it goes away." On Wed, May 19, 2010 at 10:05 PM, Keith Thornhill wrote: > in a 5 node cluster, i noticed in our client error log that one of the

Re: how to decommission two slow nodes?

2010-05-20 Thread Jonathan Ellis
One possibility: rsync the data to the next node in the ring that is in the same DC. (specifically, rsync once, then flush on the source node and rsync again.) Then stop the entire cluster, and restart everyone but those two nodes. Then run nodetool repair on each machine. If your client is not

Re: Pooling Question

2010-05-20 Thread Jake Luciani
Look in /contrib it's already there. On May 20, 2010, at 6:23 PM, Mark Robson wrote: On 20 May 2010 23:16, Ryan Daum wrote: I personally would love to see Cassandra add the concept of a read- only 'proxy' node which acts like the embedded ready only mode (Java 'fat client') but sits as a

Re: Pooling Question

2010-05-20 Thread Mark Robson
On 20 May 2010 23:16, Ryan Daum wrote: > I personally would love to see Cassandra add the concept of a read-only > 'proxy' node which acts like the embedded ready only mode (Java 'fat > client') but sits as a stand alone server. It would know the the entire ring > and watch Gossip and thus be abl

Re: Pooling Question

2010-05-20 Thread Ryan Daum
I personally would love to see Cassandra add the concept of a read-only 'proxy' node which acts like the embedded ready only mode (Java 'fat client') but sits as a stand alone server. It would know the the entire ring and watch Gossip and thus be able to direct requests to the most appropriate node

Re: Pooling Question

2010-05-20 Thread Mark Robson
On 20 May 2010 20:17, David Wellman wrote: > I have a 5 node cassandra cluster and I am wondering if there is any > advantage of setting up a connection pool that is balanced across all 5 > nodes (IE: 50 connections = 10 per node) over one pool all to one server (50 > connection => one node) De

Re: Accessing Cassandra from R

2010-05-20 Thread Charles Woerner
Or possibly in a language for which there are R<->$language IDL bindings. I would not be surprised if you could do this using SWIG. http://www.swig.org/Doc1.3/R.html On Thu, May 20, 2010 at 2:01 PM, Jonathan Ellis wrote: > You'd probably need to build a proxy in a language that Thrift supports

Re: Compaction JMX Stats?

2010-05-20 Thread Jonathan Ellis
No, CM is not exposed to nodetool yet. (You should really be putting metrics into a real monitoring system rather than relying on nodetool. Some example munin plugins are at http://github.com/jbellis/cassandra-munin-plugins, for instance.) CM also has BytesCompacted/BytesTotalInProgress. Backup

Re: Cause possible for a "cound not connect" TTransportException.NOT_OPEN , during load datas ?

2010-05-20 Thread Jonathan Ellis
"disseminating load info" is not related to your problem. certainly you should be using connection pooling rather than opening a ton of sockets. You didn't say what CL you are using, but you should not use CL.ZERO for load tests. On Thu, May 20, 2010 at 11:14 AM, xavier manach wrote: > Hi. > >

Re: Strange error with data reading

2010-05-20 Thread Jonathan Ellis
yes, the extra io + cpu caused by decomission will affect reads (and writes, to a lesser degree) On Thu, May 20, 2010 at 10:18 AM, Maxim Kramarenko wrote: > It reports, that node 3 transfer data to node 1. As I remember, node 2 > doesn't send or receive data. > > BTW, after a few hrs (probably, a

Re: Timeouts running batch_mutate

2010-05-20 Thread Jonathan Ellis
HBase has the same problem. Your choices are basically (a) figure out a way to not do all writes sequentially or (b) figure out a way to model w/o OPP. Most Cassandra users go with option (b). On Thu, May 20, 2010 at 8:21 AM, Sonny Heer wrote: > Yes, I'm using OOP, because of the way we modeled

Re: Accessing Cassandra from R

2010-05-20 Thread Jonathan Ellis
You'd probably need to build a proxy in a language that Thrift supports. On Thu, May 20, 2010 at 2:49 AM, Kyusik Chung wrote: > Does anyone have suggestions on how to access cassandra from R > (http://www.r-project.org/)? > > Thanks! > > Kyusik Chung -- Jonathan Ellis Project Chair, Apache C

Re: real-world dataset from social network?

2010-05-20 Thread Valerio Schiavoni
Hello, It's unclear if you're looking for data that can be stored in Cassandra or > an example of someone using Cassandra to store a network; I'm assuming the > former. > You're assuming incorrectly. I'm looking an example of someone using Cassandra to store a graph. > You will have a hard time

Compaction JMX Stats?

2010-05-20 Thread Anthony Molinaro
Hi, In the 0.5.x series there was a COMPACTION-POOL which kept track of in process, pending and completed compactions. With 0.6.x this seems to have vanished and instead we only have the CompactionManager PendingTasks statistic. Is there also a completed tasks somewhere? Is there any way to d

Pooling Question

2010-05-20 Thread David Wellman
I have a 5 node cassandra cluster and I am wondering if there is any advantage of setting up a connection pool that is balanced across all 5 nodes (IE: 50 connections = 10 per node) over one pool all to one server (50 connection => one node)

Cause possible for a "cound not connect" TTransportException.NOT_OPEN , during load datas ?

2010-05-20 Thread xavier manach
Hi. I am neebie in Cassandra. I study and compare performance of databases for choose my future architecture. I try to load a lot of datas in cassandra. I use python with protocol thrift (very simple, whithout threading) I do sequentials requests : client.batch_mutate and client.get_slice. (a

Re: real-world dataset from social network?

2010-05-20 Thread Matt Revelle
It's unclear if you're looking for data that can be stored in Cassandra or an example of someone using Cassandra to store a network; I'm assuming the former. You will have a hard time finding a social network dataset with relationships already well-defined for free. I have seen crawls of Twitte

Re: key path vs super column

2010-05-20 Thread Brandon Williams
On Wed, May 19, 2010 at 7:15 PM, Torsten Curdt wrote: > We are currently working on a prototype that is using Cassandra for > realtime-ish statistics system. This seems to be quite a common use > case. If people are interested - maybe it be worth collaborating on > this beyond design discussions

Re: Strange error with data reading

2010-05-20 Thread Maxim Kramarenko
It reports, that node 3 transfer data to node 1. As I remember, node 2 doesn't send or receive data. BTW, after a few hrs (probably, after decomission finished), node 2 become work again, without restart. Can decommission of node 3 affect reading ? On 20.05.2010 18:51, Jonathan Ellis wrote:

Re: real-world dataset from social network?

2010-05-20 Thread uncle mantis
MIne is under developement. Sorry I can't help you at the moment :( Regards, Michael On Thu, May 20, 2010 at 12:09 PM, Valerio Schiavoni < valerio.schiav...@gmail.com> wrote: > Not strictly Facebook. > Any online social network is ok to me, as long as it has a reasonable > number of users and

Re: real-world dataset from social network?

2010-05-20 Thread Valerio Schiavoni
Not strictly Facebook. Any online social network is ok to me, as long as it has a reasonable number of users and that it's built on top of a schema-less storage system. Are you looking for Facebook stuff? Good luck on getting a data set from any > real world model. > > > Hello everyone, >> i'm a

Re: real-world dataset from social network?

2010-05-20 Thread uncle mantis
Are you looking for Facebook stuff? Good luck on getting a data set from any real world model. Regards, Michael On Thu, May 20, 2010 at 11:53 AM, Valerio Schiavoni < valerio.schiav...@gmail.com> wrote: > Hello everyone, > i'm a phd student looking for some real-world dataset of any social > ne

real-world dataset from social network?

2010-05-20 Thread Valerio Schiavoni
Hello everyone, i'm a phd student looking for some real-world dataset of any social networks built on top of some schema-less storage system. The dataset should at least provide a mean to reconstruct the graph of users. Due to possible sensible informations in the dataset, the dataset can be very p

Re: Timeouts running batch_mutate

2010-05-20 Thread Sonny Heer
meant to say OPP :) On Thu, May 20, 2010 at 8:21 AM, Sonny Heer wrote: > Yes, I'm using OOP, because of the way we modeled our data.  Does > Cassandra not handle OOP intensive write operations?  Is HBase a > better approach if one must use OOP? > > > On Thu, May 20, 2010 at 7:41 AM, Jonathan Elli

Re: Timeouts running batch_mutate

2010-05-20 Thread Sonny Heer
Yes, I'm using OOP, because of the way we modeled our data. Does Cassandra not handle OOP intensive write operations? Is HBase a better approach if one must use OOP? On Thu, May 20, 2010 at 7:41 AM, Jonathan Ellis wrote: > Are you using OOP?  That will tend to create hot spots like this, > whi

Re: Some questions about using Binary Memtable to import data.

2010-05-20 Thread Jonathan Ellis
On Wed, May 19, 2010 at 1:37 AM, Peng Guo wrote: > Thanks for you information. > > I look at some source code of the implement. There still some question: > > 1 How did I know that the binary write message send to endpoint success? It doesn't. It's fire-and-forget. If you look at the example it

Re: Data migration from mysql to cassandra

2010-05-20 Thread Jonathan Ellis
No. On Wed, May 19, 2010 at 4:12 PM, Beier Cai wrote: > Thanks Jonathan, using mysql as an id sequence generator definitely is a > good options. One thing though, does using sequential ids defeat the purpose > of random partitioner? > > On Tue, May 18, 2010 at 11:25 PM, Jonathan Ellis wrote: >>

Re: Strange error with data reading

2010-05-20 Thread Jonathan Ellis
What does JMX report as described in http://wiki.apache.org/cassandra/Streaming ? 2010/5/19 Maxim Kramarenko : > Hello! > > I have 3 node cluster: node1, node2, node3. Replication factor = 2. > I run decommission on node3 and it's in progress, moving data to node1 > > Ring on all nodes show all 3

Re: Timeouts running batch_mutate

2010-05-20 Thread Jonathan Ellis
Are you using OOP? That will tend to create hot spots like this, which is why most people deploy on RP. If you are using RP you may simply need to add C* capacity, or take TimeoutException as a signal to throttle your activity. On Tue, May 18, 2010 at 4:37 PM, Sonny Heer wrote: > Yeah there are

Re: Scaling problems

2010-05-20 Thread Jonathan Ellis
Some possibilities: You didn't adjust Cassandra heap size in cassandra.in.sh (1GB is too small) You're inserting at CL.ZERO (ROW-MUTATION-STAGE in tpstats will show large pending ops -- large = 100s) You're creating large rows a bit at a time and Cassandra OOMs when it tries to compact (the oom sh

Re: JMX metrics for monitoring

2010-05-20 Thread Jonathan Ellis
Here are the basics I discuss in Riptano's training classes: http://github.com/jbellis/cassandra-munin-plugins On Mon, May 17, 2010 at 3:02 PM, Maxim Kramarenko wrote: > Hi! > > Which JMX metrics do you use for Cassandra monitoring ? Which values can be > used for alerts ? > -- Jonathan Ellis

Re: unbalanced token assignment with random partioner

2010-05-20 Thread Jonathan Ellis
Yes, if you add nodes when the existing one doesn't have enough data to guess a good token from the keys it has, it uses a random token. Created https://issues.apache.org/jira/browse/CASSANDRA-1112 to use midpoint instead. On Mon, May 17, 2010 at 4:06 PM, Chris Shorrock wrote: > I have a feeling

Re: Run several cassandra instances on local machine

2010-05-20 Thread omallassi
You can also easily use VM (using NAT instead of bridged to avoid DHCP when and if your computer connects to different network) to launch several nodes with different IPs on your single host machine (I suppose you want to do this for training) On Thu, 20 May 2010 07:03:57 -0700, Jonathan Ellis w

Re: Run several cassandra instances on local machine

2010-05-20 Thread Jonathan Ellis
You can easily run on 127.0.0.1, 127.0.0.2, etc. On Thu, May 20, 2010 at 7:00 AM, Yan Virin wrote: > It seems to be impossible to run several cassandra instances on a > localmachine, due to the fact that the seeds are described as ip addresses > and not couples of ip address and port. > Is this c

Re: What happened if one server involved in the process of data reading fail?

2010-05-20 Thread Simon Smith
On Thu, May 20, 2010 at 8:08 AM, 史英杰 wrote: > Hi, All, >     I am now learning the mechanism Cassandra adopts to get high > availability and fault tolerance.  As I know, we should connect to one > server of Cassandra first, then we can read or write data  through it, so if > the server which we co

What happened if one server involved in the process of data reading fail?

2010-05-20 Thread 史英杰
Hi, All, I am now learning the mechanism Cassandra adopts to get high availability and fault tolerance. As I know, we should connect to one server of Cassandra first, then we can read or write data through it, so if the server which we connect to get down, what will happen? Should we have to