Re: getting number of terms in a document/field

2015-02-06 Thread Michael McCandless
How will you know how large to allocate that array? The within-doc term freq can in general be arbitrarily large... Lucene does not directly store the total number of terms in a document, but it does store it approximately in the doc's norm value. Maybe you can use that? Alternatively, you can s

Re: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)

2015-02-06 Thread Piotr Idzikowski
Hello. A little bit delayed question. But recently I have found this articles: https://wiki.apache.org/solr/SolrPerformanceProblems https://wiki.apache.org/solr/ShawnHeisey#GC_Tuning Especially this part from first url: *Using the ConcurrentMarkSweep (CMS) collector with tuning parameters is a ver

Re: getting number of terms in a document/field

2015-02-06 Thread Ahmet Arslan
Hi Michael, Thanks for the explanation. I am working with a TREC dataset, since it is static, I set size of that array experimentally. I followed the DefaultSimilarity#lengthNorm method a bit. If default similarity and no index time boost is used, I assume that norm equals to 1.0 / Math.sqrt

Re: getting number of terms in a document/field

2015-02-06 Thread Michael McCandless
On Fri, Feb 6, 2015 at 8:51 AM, Ahmet Arslan wrote: > Hi Michael, > > Thanks for the explanation. I am working with a TREC dataset, > since it is static, I set size of that array experimentally. > > I followed the DefaultSimilarity#lengthNorm method a bit. > > If default similarity and no index ti

RE: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)

2015-02-06 Thread McKinley, James T
Just to be clear in case there was any confusion about my previous message regarding G1GC, we do not use Solr, my team works on a proprietary Lucene-based search engine. Consequently, I can't really give any advice regarding Solr with G1GC, but for our uses (so far anyway), G1GC seems to work w