Solr 4.0 Beta, termIndexInterval vs termIndexDivisor vs termInfosIndexDivisor

2012-09-07 Thread Tom Burton-West
Hello all, Due to multiple languages and dirty OCR, our indexes have over 2 billion unique terms ( http://www.hathitrust.org/blogs/large-scale-search/too-many-words-again). In Solr 3.6 and previous we needed to reduce the memory used for storing the in-memory representation of the tii file. We

Re: Solr 4.0 Beta, termIndexInterval vs termIndexDivisor vs termInfosIndexDivisor

2012-09-07 Thread Tom Burton-West
Thanks Robert, I'll have to spend some time understanding the default codec for Solr 4.0. Did I miss something in the changes file? I'll be digging into the default codec docs and testing sometime in next week or two (with a 2 billion term index) If I understand it well enough, I'll be happy

Re: Solr 4.0 Beta, termIndexInterval vs termIndexDivisor vs termInfosIndexDivisor

2012-09-07 Thread Robert Muir
On Fri, Sep 7, 2012 at 2:19 PM, Tom Burton-West tburt...@umich.edu wrote: Thanks Robert, I'll have to spend some time understanding the default codec for Solr 4.0. Did I miss something in the changes file? http://lucene.apache.org/core/4_0_0-BETA/ see the file formats section, especially

Re: Solr 4.0 Beta, termIndexInterval vs termIndexDivisor vs termInfosIndexDivisor

2012-09-07 Thread Tom Burton-West
Thanks Robert, if not, just customize blocktree's params with a CodecFactory in solr, or even pick another implementation (FixedGap, VariableGap, whatever). Still trying to get my head around 4.0 and flexible indexing. I'll take another look at Mike's and your presentations. I'm trying to