Use of AllTermDocs with custom scorer

2009-11-16 Thread Peter Keegan
I have a custom query object whose scorer uses the 'AllTermDocs' to get all non-deleted documents. AllTermDocs returns the docId relative to the segment, but I need the absolute (index-wide) docId to access external data. What's the best way to get the unique, non-deleted docId? Thanks, Peter

Re: Use of AllTermDocs with custom scorer

2009-11-16 Thread Peter Keegan
I forgot to mention that this is with V2.9.1 On Mon, Nov 16, 2009 at 1:39 PM, Peter Keegan wrote: > I have a custom query object whose scorer uses the 'AllTermDocs' to get all > non-deleted documents. AllTermDocs returns the docId relative to the > segment, but I need the absolute (index-wide) do

Re: Use of AllTermDocs with custom scorer

2009-11-16 Thread Peter Keegan
The same thing is occurring in my custom sort comparator. The ScoreDocs passed to the 'compare' method have docIds that seem to be relative to the segment. Is there any way to translate these into index-wide docIds? Peter On Mon, Nov 16, 2009 at 2:06 PM, Peter Keegan wrote: > I forgot to mention

Re: Use of AllTermDocs with custom scorer

2009-11-16 Thread Michael McCandless
Can you remap your external data to be per segment? Presumably hat would make reopens faster for your app. For your custom sort comparator, are you using FieldComparator? If so, Lucene calls setNextReader to tell you the reader & docBase. Failing these, Lucene currently visits the readers in in

Re: Use of AllTermDocs with custom scorer

2009-11-16 Thread Peter Keegan
>Can you remap your external data to be per segment? That would provide the tightest integration but would require a major redesign. Currently, the external data is in a single file created by reading a stored field after the Lucene index has been committed. Creating this file is very fast with 2.

Re: Use of AllTermDocs with custom scorer

2009-11-17 Thread Michael McCandless
On Mon, Nov 16, 2009 at 6:38 PM, Peter Keegan wrote: >>Can you remap your external data to be per segment? > > That would provide the tightest integration but would require a major > redesign. Currently, the external data is in a single file created by > reading a stored field after the Lucene in

Re: Use of AllTermDocs with custom scorer

2009-11-17 Thread Peter Keegan
The external data is just an array of fixed-length records, one for each Lucene document. Indexes are updated at regular intervals in one jvm. A searcher jvm opens the index and reads all the fixed-length records into RAM. Given an index-wide docId, the custom scorer can quickly access the correspo

Re: Use of AllTermDocs with custom scorer

2009-11-17 Thread Peter Keegan
>This is a generic solution, but just make sure you don't do the >map lookup for every doc collected, if you can help it, else that'll >slow down your search. What I just learned is that a Scorer is created for each segment (lights on!). So, couldn't I just do the subreader->docBase map lookup onc

Re: Use of AllTermDocs with custom scorer

2009-11-17 Thread Michael McCandless
On Tue, Nov 17, 2009 at 10:23 AM, Peter Keegan wrote: >>This is a generic solution, but just make sure you don't do the >>map lookup for every doc collected, if you can help it, else that'll >>slow down your search. > > What I just learned is that a Scorer is created for each segment (lights > on!

Re: Use of AllTermDocs with custom scorer

2009-11-17 Thread Michael McCandless
On Tue, Nov 17, 2009 at 8:58 AM, Peter Keegan wrote: > The external data is just an array of fixed-length records, one for each > Lucene document. Indexes are updated at regular intervals in one jvm. A > searcher jvm opens the index and reads all the fixed-length records into > RAM. Given an index

Re: Use of AllTermDocs with custom scorer

2009-11-17 Thread Peter Keegan
> But if re-creating the entire file on each reopen isn't a problem for > you then there's no need to change this :) It's actually created after IndexWriter.commit(), but same idea. If we needed real-time indexing, or if disk I/O gets excessive, I'd go with separate files per segment. >Hmm -- if