I have a SolrCloud cluster (2 machines, 2 Solr instances, 32 shards,
replication factor of 2) that I've been using for over a month now in
production.

Suddenly, the hourly cron I run that dispatches a delete by query
completely halts all indexing. Select queries still run (and quickly),
there is no CPU or disk I/O happening, but suddenly my indexer (which runs
at ~400 doc/sec steady) pauses, and everything blocks indefinitely.

To clarify some on the schema, this is a moving window of data (imagine
messages that don't matter after a 24 hour period) which are regularly
"chopped" off by my hourly cron (deleting messages over 24 hours old) to
keep the index size reasonable.

There are no errors (log level warn) in the logs. I'm not sure what to look
into. As I've said this has been running (delete included) for about a
month.

I'll also note that I have another cluster much like this one where I do
the very same thing... it has 4 machines, and indexes 10x the documents per
second, with more indexes... and yet I delete on a cron without issue...

Any ideas on where to start, or other information I could provide?

Thanks much.

Reply via email to