On Tue, 2013-12-03 at 17:09 +0100, michallos wrote:
> This occurs only on production environment so I can't profile it :-)

Sure you can [Smiley]

If you use jvisualvm and stay away from the "Profiler"-tab, then you
should be fine. The "Sampler" performs non-intrusive profiling. Not as
thorough as real profiling, but it might help.

So far it sounds like a classic merge-issue though. This would probably
not show up in the profiler. Have you tweaked the mergeFactor?
http://wiki.apache.org/solr/SolrPerformanceFactors#mergeFactor

With 16 shards/node (guessing: Same storage backend for all shards on a
single node, different storage backends across the nodes) and 15 second
commit time, a segment will be created every second (oversimplifying as
they will cluster, which makes matters worse for spinning drives). If
the mergeFactor is 10, this means that a merge will be going on every 10
seconds. Merges are bulk IO and for spinning drives they get penalized
by concurrent random access.

Consider doing a non-intrusive IO load-logging (bulk as well as IO/sec)
on a node. If you see bulk speed go down considerably when the IO/sec
rises, then you have your problem.

Some solutions are
- Increase your maxTime for autoCommit
- Increase the mergeFactor
- Use SSDs
- Maybe lower the amount of shards to lower the amount of thrashing 
  triggered by concurrent merges
- More RUM (and more RAM)


Regards,
Toke Eskildsen, State and University Library, Denmark

Reply via email to