Re: Performance hit of Solr checkIntegrityAtMerge

2014-12-10 Thread Tom Burton-West
Thanks Robert! Tom > Start at SegmentMerger in both places. > > In 4.10.x you can see how it just validates every part of every reader > in a naive loop: > > https://github.com/apache/lucene-solr/blob/lucene_solr_4_10/lucene/core/src/java/org/apache/lucene/index/SegmentMerger.java#L58 > > in 5.x

Re: Performance hit of Solr checkIntegrityAtMerge

2014-12-10 Thread Robert Muir
On Wed, Dec 10, 2014 at 3:46 PM, Tom Burton-West wrote: > Thanks Robert, > > With indexes close to 1 TB in size, I/O is usually our big bottleneck. > > Can you point me to where in the 4.x codebase and/or 5.x codebase I should > look to get a feel for what you mean by i/o locality? Or should I be

Re: Performance hit of Solr checkIntegrityAtMerge

2014-12-10 Thread Tom Burton-West
Thanks Robert, With indexes close to 1 TB in size, I/O is usually our big bottleneck. Can you point me to where in the 4.x codebase and/or 5.x codebase I should look to get a feel for what you mean by i/o locality? Or should I be looking at a JIRA issue? is there a short explanation you might be

Re: Performance hit of Solr checkIntegrityAtMerge

2014-12-10 Thread Robert Muir
There are two costs: cpu and i/o. The cpu cost is not much anyway but can be made basically trivial by using java 8. The i/o cost is because the check is not done with any i/o locality to the data being merged. so it could be a perf hit for an extremely large merge. In 5.0 the option is removed:

Performance hit of Solr checkIntegrityAtMerge

2014-12-10 Thread Tom Burton-West
Hello all, In the example solrconfig.xml file for Solr 4.10.2 there is the comment (appended below) that says that setting checkIntegrityAtMerge to true reduces risk of index corruption at the expense of slower merging. Can someone please point me to any benchmarks or details about the trade-off