Iterating TermsEnum for Long field produces zero values at the end

2014-11-17 Thread Barry Coughlan
Hi all, I'm using 4.10.2. I have a Long id field. Each document has one id value. I am creating a look-up between Lucene's internal document id and my id values by enumerating the inverted index: private long[] cacheDocIds() throws IOException { long[] ourIds = new

Order docIds to reduce disk seeks

2014-11-17 Thread Vijay B
*Could someone point me how to order docIds as per **http://wiki.apache.org/lucene-java/ImproveSearchingSpeed http://wiki.apache.org/lucene-java/ImproveSearchingSpeed* *Limit usage of stored fields and term vectors. Retrieving these from the index is quite costly. Typically you should only

Re: Iterating TermsEnum for Long field produces zero values at the end

2014-11-17 Thread Michael McCandless
It is expected: those are the prefix terms, which come after all the full-precision numeric terms. But I'm not sure why you see 0s ... the bytes should be unique for every term you get back from the TermsEnum. Mike McCandless http://blog.mikemccandless.com On Mon, Nov 17, 2014 at 10:39 AM,

RE: Iterating TermsEnum for Long field produces zero values at the end

2014-11-17 Thread Uwe Schindler
Hi, It is expected: those are the prefix terms, which come after all the full- precision numeric terms. But I'm not sure why you see 0s ... the bytes should be unique for every term you get back from the TermsEnum. That's easy to explain: The lower precision terms at the end have more

Re: Iterating TermsEnum for Long field produces zero values at the end

2014-11-17 Thread Barry Coughlan
Makes sense, thanks. I switched the implementation to a FieldCache with no noticeable performance difference: private Longs cacheDocIds() throws IOException { AtomicReader wrapped = SlowCompositeReaderWrapper.wrap(reader); Longs vals = FieldCache.DEFAULT.getLongs(wrapped, id, false);

Re: Iterating TermsEnum for Long field produces zero values at the end

2014-11-17 Thread Michael McCandless
It's better to use doc values than field cache, if you can. Mike McCandless http://blog.mikemccandless.com On Mon, Nov 17, 2014 at 2:55 PM, Barry Coughlan b.coughl...@gmail.com wrote: Makes sense, thanks. I switched the implementation to a FieldCache with no noticeable performance

RE: Order docIds to reduce disk seeks

2014-11-17 Thread Rose, Stuart J
Hi Vijay, ...sorting the documents you need to retrieve by docID order first... means sorting them by their 'document number' which is the value in the 'scoreDoc.doc' field and is the value that the reader takes to 'retrieve' the document from the index. If you write a comparator to sort the

Slow doc/pos file merges...

2014-11-17 Thread Ravikumar Govindarajan
Hi, I am finding that lucene is slowing down a lot when bigger and bigger doc/pos files are merged... While it's normally the case, the worrying part is all my data is in RAM. Version is 4.6.1 Some sample statistics took after instrumenting the SortingAtomicReader code, as we use a