short version: if o.a.c.concurrent.{ROW-READ-STAGE,ROW-MUTATION-STAGE} and o.a.c.db.CompactionManager have
- completed task count increasing - pending tasks stable (for RRS and RMS, stable in low hundreds or less, for CM stable in single digits or less) - the log isn't spitting out Error lines then the node is completing requests and keeping up with demand reasonably well. On Tue, Jun 22, 2010 at 3:41 PM, Andrew Psaltis <andrew.psal...@webtrends.com> wrote: > All, > We have been working through some operations scenarios, so that we are ready > to deploy our first Cassandra cluster into production in the coming months. > During this process our operations folks have asked us to provide a Health > Check service. I am using the word service here very liberally - really we > just need to provide a way for the folks in out NOC to know that not only is > the Cassandra process running (which they will get with their monitoring > tools ), but that it is actually alive and well. We do not have the intent of > verifying that the data is valid, just that every node in the cluster that is > known to be running is actually alive and healthy. My questions are - What > does it mean for a Cassandra node to be healthy? What is the minimum (from > an impact to the performance of a node) things we can check to make sure that > a node is not a zombie? > > Any and all input is greatly appreciated. > > Thanks, > Andrew > > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com