Why was it down? e.g. did it OOM? If so, the recommended approach is kill the process on OOM vs. leaving it in the cluster in a zombie state. I had similar issues when my nodes OOM'd is why I ask. That said, you can get the /clusterstate.json which contains Zk's status of a node using a request like: http://localhost:8983/solr/zookeeper?detail=true&path=%2Fclusterstate.json Although that would require some basic JSON processing to dig into the response to get the status of the node of interest, so you may want to implement a custom request handler.
On Mon, Jul 22, 2013 at 9:55 AM, jimtronic <jimtro...@gmail.com> wrote: > I've run into a problem recently that's difficult to debug and search for: > > I have three nodes in a cluster and this weekend one of the nodes went > partially down. It no longer responds to distributed updates and it is > marked as GONE in the Cloud view of the admin screen. That's not ideal, but > there's still two boxes up so not the end of the world. > > The problem is that it is still responding to ping requests and returning > queries successfully. In my setup, I have the three servers on an haproxy > load balancer so that I can distribute requests and have clients stick to a > specific solr box. Because the bad node is still returning OK to the ping > requests and still returns results for simple queries, the load balancer > does not remove it from the group. > > Is there a ping like request handler that would tell me whether the given > box I'm hitting is still "in the cloud"? > > Thanks! > Jim Musil > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Node-down-but-not-out-tp4079495.html > Sent from the Solr - User mailing list archive at Nabble.com.