Re: best practices for simulating transactions in Cassandra

2011-12-15 Thread Boris Yen
I am not sure if this is the right thread to ask about this. I read that some people are using cage+zookeeper. I was wondering if anyone evaluates https://github.com/Netflix/curator? this seems to be a versatile package. On Tue, Dec 13, 2011 at 6:06 AM, John Laban wrote: > Ok, great. I'll be s

Re: Cassandra C client implementation

2011-12-15 Thread Vlad Paiu
Hello, Congratulations on your work on libcassandra & libcassie. I agree with you that too many abstractions are not a good thing, that's why I think thrift & glib are a much better way to go than thrift & cpp & libcassandra & libcassie. I as well am looking for a very basic Thrift & glibc ex

performance reaching plateau while the hardware is still idle

2011-12-15 Thread Kent Tong
Hi, I am running a performance test for Cassandra 1.0.5. It can perform about 1500 business operation (one read+one write to the same row) per second. However, the CPU is still 85% idle (as shown by vmstat) and the IO utilization is less than a few percent (as shown by iostat). "nodetool tpstat

Using Cassandra in Rails App

2011-12-15 Thread Wolfgang Vogl
Hi, I have a couple of questions about working with Ruby on Rails and Cassandra. What is the recommended way of Cassandra integration into a Rails app ? active_column cassandra-cql some other gems? Is there any reference implementation? some projects on github that are using the gems?

Re: performance reaching plateau while the hardware is still idle

2011-12-15 Thread ruslan usifov
Use parallel test:-))) 2011/12/15 Kent Tong > Hi, > > I am running a performance test for Cassandra 1.0.5. It can perform about > 1500 business operation (one read+one write to the same row) per second. > However, the CPU is still 85% idle (as shown by vmstat) and the IO > utilization is less th

Re: Asymmetric load

2011-12-15 Thread Edward Capriolo
The more physical data that lives on a node the more intensive operations are. Especially read based ops. Even if your ring is balanced something like a failed repair could create a data imbalance. >From some offline talks with you I know you node count is on smaller end and repairs have been pro

Re: Asymmetric load

2011-12-15 Thread Dominic Williams
Btw anyone having problems with repair might like to follow proposal for different system: https://issues.apache.org/jira/browse/CASSANDRA-3620 With this system you would only need to run repair to ensure all data has maximum redundancy across the cluster (which also increases consistency for Cons

Does OpsCenter have to be running all the time in order to collect the cluster stats?

2011-12-15 Thread mike.li
Hi, I have one question about OpsCeneter: does it have to be running all the time in order to collect and store the metrics on each node within the cluster? Who does the job to write metrics on each node, the agent on the node or OpsCenter? Thanks, Mike This email was sent to you by Thomson Re

RE: [RELEASE] Apache Cassandra 1.0.6 released

2011-12-15 Thread Viktor Jevdokimov
Cassandra 1.0.6 under Windows Server 2008 R2 64bit with disk acces mode mmap_index_only failing to delete any *-Index.db files after compaction or scrub: ERROR 13:43:17,490 Fatal exception in thread Thread[NonPeriodicTasks:1,5,main] java.lang.RuntimeException: java.io.IOException: Failed to dele

Re: Using Cassandra in Rails App

2011-12-15 Thread Brian O'Neill
I'm not sure this is the best answer, but all of our webapps (RoR included) access Cassandra via REST. That is one of the major reasons we built Virgil. http://code.google.com/a/apache-extras.org/p/virgil/ It allows us to build the webapps, for the most part, independent of the actual storage mec

Re: Does OpsCenter have to be running all the time in order to collect the cluster stats?

2011-12-15 Thread Nick Bailey
The OpsCenter agents report metric data to the main OpsCenter process which processes it and writes data into the cluster. So yes, the main OpsCenter process needs to be running in order for metrics to be stored in the cluster. -Nick On Thu, Dec 15, 2011 at 9:43 AM, wrote: > Hi, > > > > I have

Re: performance reaching plateau while the hardware is still idle

2011-12-15 Thread Peter Tillotson
May I suggest dstat, does cpu, memory, and io on one console dstat -vn 3 So possible causes  * Not sufficient parrallelism in the client  * Server has too few threads  * You are not CPU bound, network or disk may be the bottleneck  p - Original Message - From: Kent Tong To: "user@cassa

RE: Does OpsCenter have to be running all the time in order to collect the cluster stats?

2011-12-15 Thread mike.li
Thank Nick for your fast response. So if the OpsCenter is down, then the stats for the cluster won't be collected. Is there any safe-guard mechanism, like a duplicate OpsCenter on second machine to monitor the cluster in case of single OpsCenter failure? Mike -Original Message- From:

Re: Does OpsCenter have to be running all the time in order to collect the cluster stats?

2011-12-15 Thread Nick Bailey
No, currently there is not failover mechanism built into OpsCenter. The agents could be reconfigured to talk to a different machine but this would be a manual process and involve additional configuration changes. On Thu, Dec 15, 2011 at 12:13 PM, wrote: > Thank Nick for your fast response. > > S

Re: Cassandra C client implementation

2011-12-15 Thread Vlad Paiu
Hello, While digging more for this I've found these : http://svn.apache.org/viewvc/thrift/lib/c_glib/test/ Which shows how to create the TSocket and TTransport structures, very similar to the way it's done in C++. Now I'm stuck on how to create the actual connection to the Cassandra server. I

RE: Does OpsCenter have to be running all the time in order to collect the cluster stats?

2011-12-15 Thread mike.li
Got it, Nick. Thanks again. :-) -Original Message- From: Nick Bailey [mailto:n...@datastax.com] Sent: Thursday, December 15, 2011 12:18 PM To: user@cassandra.apache.org Subject: Re: Does OpsCenter have to be running all the time in order to collect the cluster stats? No, currently there

Re: best practices for simulating transactions in Cassandra

2011-12-15 Thread John Laban
I'm actually using Curator as a Zookeeper client myself. I haven't used it in production yet, but so far it seems well written and Jordan Zimmerman at Netflix has been great on the support end as well. I haven't tried Cages so I can't really compare, but I think one of the main deciding factors b

Schema disagreement in 1.0.2

2011-12-15 Thread blafrisch
So in our cluster of 10 nodes we have 2 bad nodes that are in disagreement with the others. On one of the two bad nodes I tried to move the Schema* and Migration* files out of the system data directory as is listed in the http://wiki.apache.org/cassandra/FAQ#schema_disagreement FAQ . When the no

Memtable live ratio of infinity

2011-12-15 Thread Caleb Rackliffe
Hi All, I saw the following log message today on a node running cassandra 1.0.5: "WARN [pool-1-thread-1] 2011-12-15 20:28:53,915 Memtable.java (line 174) setting live ratio to maximum of 64 instead of Infinity" I guess this means calculated throughput is either very low or the Memtable is huge

Re: Schema disagreement in 1.0.2

2011-12-15 Thread blafrisch
So I was able to get the schema agreeing on the two bad nodes, but I don't particularly like the way that I did it. One at a time, I shut them down, removed Schema* and Migration*, then copied over Schema* from another working node. They then started up with the correct schema. Did I do somethin

Re: performance reaching plateau while the hardware is still idle

2011-12-15 Thread Kent Tong
Dear all, Thanks for the good suggestions! I believe it is because the test client is single-threaded, so only server thread is serving it, so the rest of the cores are just sitting idle. Thanks!   -- Author of books for learning CXF, Axis2, Wicket, JSF (http://www.agileskills2.org)

Re: Counters != Counts

2011-12-15 Thread Tyler Hobbs
Probably quite a few of them are coming from automatic retries by phpcassa. When working with counters, I recommend minimizing retries and/or increasing timeouts. Usually this means you want to use a separate connection pool with different settings just for counters. By the way, this advice appl