On Tue, 2013-12-03 at 17:09 +0100, michallos wrote: > This occurs only on production environment so I can't profile it :-)
Sure you can [Smiley] If you use jvisualvm and stay away from the "Profiler"-tab, then you should be fine. The "Sampler" performs non-intrusive profiling. Not as thorough as real profiling, but it might help. So far it sounds like a classic merge-issue though. This would probably not show up in the profiler. Have you tweaked the mergeFactor? http://wiki.apache.org/solr/SolrPerformanceFactors#mergeFactor With 16 shards/node (guessing: Same storage backend for all shards on a single node, different storage backends across the nodes) and 15 second commit time, a segment will be created every second (oversimplifying as they will cluster, which makes matters worse for spinning drives). If the mergeFactor is 10, this means that a merge will be going on every 10 seconds. Merges are bulk IO and for spinning drives they get penalized by concurrent random access. Consider doing a non-intrusive IO load-logging (bulk as well as IO/sec) on a node. If you see bulk speed go down considerably when the IO/sec rises, then you have your problem. Some solutions are - Increase your maxTime for autoCommit - Increase the mergeFactor - Use SSDs - Maybe lower the amount of shards to lower the amount of thrashing triggered by concurrent merges - More RUM (and more RAM) Regards, Toke Eskildsen, State and University Library, Denmark