It's definitely a known 'issue' that you can't replicate (or do any other kind of index change, including a commit) at a faster frequency than your warming queries take to complete, or you'll wind up with something like you've seen.

It's in some documentation somewhere I saw, for sure.

The advice to 'just query against the master' is kind of odd, because, then... why have a slave at all, if you aren't going to query against it? I guess just for backup purposes.

But even with just one solr, or querying master, if you commit at rate such that commits come before the warming queries can complete, you're going to have the same issue.

The only answer I know of is "Don't commit (or replicate) at a faster rate than it takes your warming to complete." You can reduce your warming queries/operations, or reduce your commit/replicate frequency.

Would be interesting/useful if Solr noticed this going on, and gave you some kind of error in the log (or even an exception when started with a certain parameter for testing) "Overlapping warming queries, you're committing too fast" or something. Because it's easy to make this happen without realizing it, and then your Solr does what Simon says, runs out of RAM and/or uses a whole lot of CPU and disk io.

Lance Norskog wrote:
You should query against the indexer. I'm impressed that you got 5s
replication to work reliably.

On Mon, Nov 1, 2010 at 4:27 PM, Simon Wistow <si...@thegestalt.org> wrote:
We've been trying to get a setup in which a slave replicates from a
master every few seconds (ideally every second but currently we have it
set at every 5s).

Everything seems to work fine until, periodically, the slave just stops
responding from what looks like it running out of memory:

org.apache.catalina.core.StandardWrapperValve invoke
SEVERE: Servlet.service() for servlet jsp threw exception
java.lang.OutOfMemoryError: Java heap space


(our monitoring seems to confirm this).

Looking around my suspicion is that it takes new Readers longer to warm
than the gap between replication and thus they just build up until all
memory is consumed (which, I suppose isn't really memory 'leaking' per
se, more just resource consumption)

That said, we've tried turning off caching on the slave and that didn't
help either so it's possible I'm wrong.

Is there anything we can do about this? I'm reluctant to increase the
heap space since I suspect that will mean that there's just a longer
period between failures. Might Zoie help here? Or should we just query
against the Master?


Thanks,

Simon




Reply via email to