getting number of terms in a document/field

2015-02-05 Thread Ahmet Arslan
Hello Lucene Users, I am traversing all documents that contains a given term with following code : Term term = new Term(field, word); Bits bits = MultiFields.getLiveDocs(reader); DocsEnum docsEnum = MultiFields.getTermDocsEnum(reader, bits, field, term.bytes()); while (docsEnum.nextDoc() != Doc

Re: getting number of terms in a document/field

2015-02-06 Thread Michael McCandless
How will you know how large to allocate that array? The within-doc term freq can in general be arbitrarily large... Lucene does not directly store the total number of terms in a document, but it does store it approximately in the doc's norm value. Maybe you can use that? Alternatively, you can s

Re: getting number of terms in a document/field

2015-02-06 Thread Ahmet Arslan
Hi Michael, Thanks for the explanation. I am working with a TREC dataset, since it is static, I set size of that array experimentally. I followed the DefaultSimilarity#lengthNorm method a bit. If default similarity and no index time boost is used, I assume that norm equals to 1.0 / Math.sqrt

Re: getting number of terms in a document/field

2015-02-06 Thread Michael McCandless
On Fri, Feb 6, 2015 at 8:51 AM, Ahmet Arslan wrote: > Hi Michael, > > Thanks for the explanation. I am working with a TREC dataset, > since it is static, I set size of that array experimentally. > > I followed the DefaultSimilarity#lengthNorm method a bit. > > If default similarity and no index ti

Re: getting number of terms in a document/field

2015-02-08 Thread Ahmet Arslan
Hi, Sorry for my ignorance, how do I obtain AtomicReader from a IndexReader? I figured above code but it gives me a list of atomic readers. for (AtomicReaderContext context : reader.leaves()) { NumericDocValues docValues = context.reader().getNormValues(field); if (docValues != null) normValu