Re: Iterating TermsEnum for Long field produces zero values at the end

2014-11-18 Thread Barry Coughlan
Hi Michael, Indexing: private NumericDocValuesField idField = new NumericDocValuesField(id, 0); Reading: private NumericDocValues cacheDocIds() throws IOException { AtomicReader wrapped = SlowCompositeReaderWrapper.wrap(reader); return DocValues.getNumeric(wrapped, id);

Re: Iterating TermsEnum for Long field produces zero values at the end

2014-11-18 Thread Barry Coughlan
Never mind, I got it: MultiDocValues.getNumericValues(final IndexReader r, final String field) Barry On Tue, Nov 18, 2014 at 12:05 PM, Barry Coughlan b.coughl...@gmail.com wrote: Hi Michael, Indexing: private NumericDocValuesField idField = new NumericDocValuesField(id, 0); Reading:

Re: Iterating TermsEnum for Long field produces zero values at the end

2014-11-18 Thread Michael McCandless
FieldCache is (will be?) already gone in 5.0: it's moved to the misc module. It is slow the first time you use it since it must walk all postings doing the inversion. It is also a heap hog compared to doc values which get more dev attention and try to be more careful in how they spend heap. If

Re: Order docIds to reduce disk seeks

2014-11-18 Thread Vijay B
Thank you Stuart. I got it working with: // sort by docids Arrays.sort(scoreDocs, new ComparatorScoreDoc() { @Override public int compare(ScoreDoc o1, ScoreDoc o2) { return Integer.compare(o1.doc, o2.doc); } }); On Mon, Nov 17, 2014 at 6:05 PM, Rose, Stuart J stuart.r...@pnnl.gov wrote: Hi

Re: Order docIds to reduce disk seeks

2014-11-18 Thread Michael McCandless
Even if you sort all hits by docID it's likely too slow to visit every single one and load the stored document ... Try to find another way to solve your problem, making use of the inverted index? Mike McCandless http://blog.mikemccandless.com On Mon, Nov 17, 2014 at 6:05 PM, Rose, Stuart J

Re: Order docIds to reduce disk seeks

2014-11-18 Thread Vijay B
Hi Mike, could you provide some pointers on using inverted index. Any examples or what API classes to use to accomplish this. On Tue, Nov 18, 2014 at 12:40 PM, Michael McCandless luc...@mikemccandless.com wrote: Even if you sort all hits by docID it's likely too slow to visit every single

Re: Order docIds to reduce disk seeks

2014-11-18 Thread Barry Coughlan
Hi Vijay, I'm guessing Michael means that perhaps your text processing step could be better solved by using Lucene features. The use case of Lucene you describe in your post is better suited to a key value store or a relational database. Can you give more details on what your text processing

Re: Order docIds to reduce disk seeks

2014-11-18 Thread Vijay B
Hi Barry, here is our usecase. We fetch doc text from lucene and feed it to http://carrotsearch.com/ libary for generating document clusters as a text processing step.Carrotsearch API need to be fed with list of org.carrot2.core.Document

Re: Order docIds to reduce disk seeks

2014-11-18 Thread brettgleeson83
luc...@mikemccandless.com Sent from my BlackBerry® wireless device -Original Message- From: Vijay B vijay.nip...@gmail.com Date: Tue, 18 Nov 2014 14:41:16 To: java-user@lucene.apache.org Reply-To: java-user@lucene.apache.org Subject: Re: Order docIds to reduce disk seeks Hi Mike, could