I have a SolrCloud cluster (2 machines, 2 Solr instances, 32 shards, replication factor of 2) that I've been using for over a month now in production.
Suddenly, the hourly cron I run that dispatches a delete by query completely halts all indexing. Select queries still run (and quickly), there is no CPU or disk I/O happening, but suddenly my indexer (which runs at ~400 doc/sec steady) pauses, and everything blocks indefinitely. To clarify some on the schema, this is a moving window of data (imagine messages that don't matter after a 24 hour period) which are regularly "chopped" off by my hourly cron (deleting messages over 24 hours old) to keep the index size reasonable. There are no errors (log level warn) in the logs. I'm not sure what to look into. As I've said this has been running (delete included) for about a month. I'll also note that I have another cluster much like this one where I do the very same thing... it has 4 machines, and indexes 10x the documents per second, with more indexes... and yet I delete on a cron without issue... Any ideas on where to start, or other information I could provide? Thanks much.