I think it's pretty easy to check if SOLR is alive. Even from a shell
script, a simple command like
curl -iIs --url "http://solrhost/solr/select?start=0&rows=0" | grep -c
"HTTP/1.1 200 OK"
will return 1 if the response is an HTTP 200. If the return is not 1,
then there is a problem. A load balancer or other tool can probably
internalize the check and not need to fork processes like a shell
script would, but the check can be the same. This simply requests an
HTTP HEAD (doesn't return any content) to for a fast executing query.
In this case, the query with no "q=" specified seems to default to *:*
when using dismax which is my default handler.
-Bryan
On Jan 15, 2009, at Jan 15, 2:13 PM, Stephen Weiss wrote:
I've been wondering about this one myself - most of the services we
have installed work this way, if they crash out for whatever reason
they restart automatically (Apache, MySQL, even the OS itself).
Failures are detected and corrected by the load balancers and also
in some cases by the machine itself (like with kernel panics). But
not SOLR, and I'm not quite sure what to do to get it there. We use
Jetty but it's the same story. It's not like it fails out all that
often, but when it does it will still respond to HTTP requests
(because Jetty itself is still working), which makes it a lot harder
to detect a failure... I've tried writing something for nagios but
the problem is that most responses solr would give to a request vary
depending on index updates, so it's not like I can just take a
checksum and compare it - and even then, it would only really alert
us to the problem, we'd still have to go in and restart everything
(personally I don't enjoy restarting servers from my blackberry
nearly as much as I should).
I'd have to come up with something that can intelligently interpret
the response and decide if the server's still working properly or
not, and the processing time on that alone might make it too
inefficient to run every few seconds, but at least with that we'd be
able to tell the cluster "don't send anything to this server for
now". Is there some really obvious way to track if a particular
servlet is still running properly (in either Tomcat or Jetty,
because if Tomcat has this I'd switch) and restart the container if
it's not?
Thanks!!
--
Steve
On Jan 15, 2009, at 1:57 PM, Jerome L Quinn wrote:
An even bigger problem is the fact that once Solr is wedged, it
stays that
way until a human notices and restarts things. The tomcat stays
running
and there's no automatic detection that will either restart Solr, or
restart the Tomcat container.
Any suggestions on either front?
Thanks,
Jerry Quinn