Bug in TopFieldCollector?

2009-03-30 Thread Shai Erera
Hi As I prepared the patch for 1575, I noticed a strange implementation in TopFieldCollector's topDocs(): ScoreDoc[] scoreDocs = new ScoreDoc[queue.size()]; if (fillFields) { for (int i = queue.size() - 1; i = 0; i--) { scoreDocs[i] =

Re: Bug in TopFieldCollector?

2009-03-30 Thread Michael McCandless
Looks like quite a bug, Shai! Thanks. It came in with LUCENE-1483. I would say add test case fix it under 1575. Mike On Mon, Mar 30, 2009 at 3:50 AM, Shai Erera ser...@gmail.com wrote: Hi As I prepared the patch for 1575, I noticed a strange implementation in TopFieldCollector's

Re: Bug in TopFieldCollector?

2009-03-30 Thread Shai Erera
Already did ! Another question - I think we somehow broke TopFieldCollector ... Previously, in TopFieldDocCollector, it accepted an IndexReader as a parameter, and now it requires IndexReader[], which is called subReaders. Calling the 'fast' search methods with Sort has no problem obtaining that

Re: Bug in TopFieldCollector?

2009-03-30 Thread Michael McCandless
I agree, this is not a pleasant migration path forward from 2.4. I think maybe a good fix is to not even require IndexReader[] subReaders to be passed in, in the first place. Tracing downwards, the only reason why we needs this array at construction time is for the SortField.CUSTOM case, when it

RE: Bug in TopFieldCollector?

2009-03-30 Thread Uwe Schindler
Why not call IndexSearcher.getIndexReader().getSequentialSubReaders() (see http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc/all/org/apach e/lucene/index/IndexReader.html). Its public and documented as this: public

Re: Bug in TopFieldCollector?

2009-03-30 Thread Michael McCandless
Well, IndexSearcher also sorts its readers biggest to smallest (by .numDocs()) for better performance (so that the queues fill up as much as possible before hitting reader transitions). I think it's the exception, not the rule, for when a custom comparator would require the full array of

RE: Bug in TopFieldCollector?

2009-03-30 Thread Uwe Schindler
You are right, I forget the sorting. And I also think, the most important thing would be to remove the need for the ctor in the custom sort. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Michael

Re: Bug in TopFieldCollector?

2009-03-30 Thread Shai Erera
I checked where it is used, and this arg is required by FieldValueHitQueue, by its only constructor. The array is passed to each field's getComparator method, which uses it only for CUSTOM field indeed. There, it calls comparatorSource.newComparator, and there's only one implementation now of it,

[jira] Commented: (LUCENE-1575) Refactoring Lucene collectors (HitCollector and extensions)

2009-03-30 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12693729#action_12693729 ] Michael McCandless commented on LUCENE-1575: I think as part of this we should

Re: InstantiatedIndex

2009-03-30 Thread Karl Wettin
28 mar 2009 kl. 01.21 skrev Jason Rutherglen: I'm thinking InstantiatedIndex needs to implement either clone of all the index data or needs to be able to accept a non-optimized reader, or both. I forget what the obstacles are to implementing the non-optimized reader option? Do you

[jira] Updated: (LUCENE-1578) InstantiatedIndex supports non-optimized IndexReaders

2009-03-30 Thread Karl Wettin (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wettin updated LUCENE-1578: Attachment: LUCENE-1578.txt Please test this patch using a couple of different unoptimized

[jira] Commented: (LUCENE-1575) Refactoring Lucene collectors (HitCollector and extensions)

2009-03-30 Thread Shai Erera (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12693732#action_12693732 ] Shai Erera commented on LUCENE-1575: I am not sure what you mean - score is used all

[jira] Commented: (LUCENE-1575) Refactoring Lucene collectors (HitCollector and extensions)

2009-03-30 Thread Shai Erera (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12693743#action_12693743 ] Shai Erera commented on LUCENE-1575: Ok I now understand better where score is used in

[jira] Commented: (LUCENE-1039) Bayesian classifiers using Lucene as data store

2009-03-30 Thread Karl Wettin (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12693744#action_12693744 ] Karl Wettin commented on LUCENE-1039: - Vaijanath, can you please post a small test

[jira] Commented: (LUCENE-1575) Refactoring Lucene collectors (HitCollector and extensions)

2009-03-30 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12693778#action_12693778 ] Michael McCandless commented on LUCENE-1575: bq. The question is what to do

[jira] Commented: (LUCENE-1575) Refactoring Lucene collectors (HitCollector and extensions)

2009-03-30 Thread Shai Erera (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12693787#action_12693787 ] Shai Erera commented on LUCENE-1575: bq. Turning off scoring in TopFieldCollector's

[jira] Commented: (LUCENE-1575) Refactoring Lucene collectors (HitCollector and extensions)

2009-03-30 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12693796#action_12693796 ] Michael McCandless commented on LUCENE-1575: bq. Or introducing a new ctor

[jira] Commented: (LUCENE-1575) Refactoring Lucene collectors (HitCollector and extensions)

2009-03-30 Thread Shai Erera (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12693812#action_12693812 ] Shai Erera commented on LUCENE-1575: ok I'll add another package-private ctor to

[jira] Commented: (LUCENE-1575) Refactoring Lucene collectors (HitCollector and extensions)

2009-03-30 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12693822#action_12693822 ] Michael McCandless commented on LUCENE-1575: bq. How's that sound: That

[jira] Commented: (LUCENE-1575) Refactoring Lucene collectors (HitCollector and extensions)

2009-03-30 Thread Shai Erera (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12693843#action_12693843 ] Shai Erera commented on LUCENE-1575: bq. So to be consistent maybe we create

[jira] Commented: (LUCENE-1575) Refactoring Lucene collectors (HitCollector and extensions)

2009-03-30 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12693847#action_12693847 ] Michael McCandless commented on LUCENE-1575: bq. And have STFC extend NSTFC? I

[jira] Commented: (LUCENE-1425) Add ConstantScore highlighting support to SpanScorer

2009-03-30 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12693917#action_12693917 ] Mark Miller commented on LUCENE-1425: - I'd like to commit this soon. Add

[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter

2009-03-30 Thread Jason Rutherglen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12693922#action_12693922 ] Jason Rutherglen commented on LUCENE-1516: -- Mike, nice work! I will hopefully

[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter

2009-03-30 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12693932#action_12693932 ] Michael McCandless commented on LUCENE-1516: Nice work to you too -- I just

Re: Modularization

2009-03-30 Thread Chris Hostetter
After stiring things up, and then being off-list for ~10 days, I'm in an interesting position coming back to this thread and seeing the discussion *after* it essentially ended, with a lot of semi-concensus but no clear sense of hard and fast resolution or plan of action. FWIW, here are the

Re: Modularization

2009-03-30 Thread Michael Busch
On 3/31/09 1:31 AM, Chris Hostetter wrote: code isolation (by directory hierarchy) is hte best way i've seen to ensure modularization, and protect against inadvertent dependency bleeding. +1. That's actually what I meant with one-to-one mapping between the packaging and the source code (I

Reading document in Lucene

2009-03-30 Thread mitu2009
My indexed document in Lucene has got multiple cities assigned to it...ie. doc.Add(new Field(city, city1.Trim(), Field.Store.YES, Field.Index.TOKENIZED)); doc.Add(new Field(city, city2.Trim(), Field.Store.YES, Field.Index.TOKENIZED)); etc how do i iterate thru them and read the values after

Lucene analyzer and dots

2009-03-30 Thread mitu2009
Is there any way I can make Lucene analyzer not ignore dots in the string?? for example,if my search criteria is: A.B.C.D,Lucene should give me only those documents in the search results which have A.B.C.D and not ABCD -- View this message in context:

[jira] Commented: (LUCENE-1575) Refactoring Lucene collectors (HitCollector and extensions)

2009-03-30 Thread Shai Erera (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12693997#action_12693997 ] Shai Erera commented on LUCENE-1575: bq. But what is the plan now for the