RE: Order docIds to reduce disk seeks

2014-11-17 Thread Rose, Stuart J
Hi Vijay, ...sorting the documents you need to retrieve by docID order first... means sorting them by their 'document number' which is the value in the 'scoreDoc.doc' field and is the value that the reader takes to 'retrieve' the document from the index. If you write a comparator to sort the

Re: Order docIds to reduce disk seeks

2014-11-18 Thread Vijay B
Thank you Stuart. I got it working with: // sort by docids Arrays.sort(scoreDocs, new Comparator() { @Override public int compare(ScoreDoc o1, ScoreDoc o2) { return Integer.compare(o1.doc, o2.doc); } }); On Mon, Nov 17, 2014 at 6:05 PM, Rose, Stuart J wrote: > Hi Vijay, > > ...sorting the docu

Re: Order docIds to reduce disk seeks

2014-11-18 Thread Michael McCandless
Even if you sort all hits by docID it's likely too slow to visit every single one and load the stored document ... Try to find another way to solve your problem, making use of the inverted index? Mike McCandless http://blog.mikemccandless.com On Mon, Nov 17, 2014 at 6:05 PM, Rose, Stuart J wr

Re: Order docIds to reduce disk seeks

2014-11-18 Thread Vijay B
Hi Mike, could you provide some pointers on using inverted index. Any examples or what API classes to use to accomplish this. On Tue, Nov 18, 2014 at 12:40 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > Even if you sort all hits by docID it's likely too slow to visit every > single

Re: Order docIds to reduce disk seeks

2014-11-18 Thread Barry Coughlan
Hi Vijay, I'm guessing Michael means that perhaps your text processing step could be better solved by using Lucene features. The use case of Lucene you describe in your post is better suited to a key value store or a relational database. Can you give more details on what your text processing step

Re: Order docIds to reduce disk seeks

2014-11-18 Thread Vijay B
Hi Barry, here is our usecase. We fetch doc text from lucene and feed it to http://carrotsearch.com/ libary for generating document clusters as a text processing step.Carrotsearch API need to be fed with list of org.carrot2.core.Document

Re: Order docIds to reduce disk seeks

2014-11-18 Thread brettgleeson83
luc...@mikemccandless.com Sent from my BlackBerry® wireless device -Original Message- From: Vijay B Date: Tue, 18 Nov 2014 14:41:16 To: Reply-To: java-user@lucene.apache.org Subject: Re: Order docIds to reduce disk seeks Hi Mike, could you provide some pointers on using inverted

Re: Order docIds to reduce disk seeks

2014-11-19 Thread Barry Coughlan
Hi Vijay, Could you just bypass Lucene altogether and send the documents to Carrot from the same place that Lucene got them? If for some reason you can not do that, here are some suggestions (note: I'm not a Lucene expert): 1. If you have other stored fields in your index, ensure you are only re

Re: Order docIds to reduce disk seeks

2014-11-21 Thread Vijay B
The source of data is Oracle DB and that is not an option for us due to the volume of requests we are expect and the amount of text we are pulling. IndexSearcher is managed via Searchmanger. >>> If you are querying for most of the fields in the index the it might be more efficient to iterat