As a bit of background, we run a setup (coming from 3.6.1 to 4.2 relatively recently) with a single master receiving updates with three slaves pulling changes in. Our index is around 5 million documents, around 26GB in size total.
The situation I'm seeing is this: occasionally we update the master, and replication begins on the three slaves, seems to proceed normally until it hits the end. At that point, it "sticks"; there's no messages going on in the logs, nothing on the admin page seems to be happening. I sit there for sometimes upwards of 30 minutes, seeing no further activity in the index folder(s). After a while, I go to the core admin page and manually reload the core, which "catches it up". It seems like the index readers / writers are not releasing the index otherwise? The configuration is set to reopen; very occasionally this situation actually fixes itself after a longish period of time, but it seems very annoying. I had at first suspected this to be due to our underlying shared (SAN) storage, so we installed SSDs in all three slave machines, and moved the entire indexes to those. It did not seem to affect this issue at all (additionally, I didn't really see the expected performance boost, but that's a separate issue entirely). Any ideas? Any configuration details I might share/reconfigure? Any suggestions are appreciated. I could also upgrade to the later 4.3+ versions, if that might help. Thanks! Neal Ensor nen...@gmail.com