Benchmark: EnwikiDocMaker does not use fileIn (BufferedReader)

2009-04-09 Thread Shai Erera
I started working on the patch for 1591, and noticed EnwikiDocMaker uses the FileInputStream instance from LineDocMaker and not the BuferredReader. I don't see any reason to this, as InputSource accepts a Reader. I can change it as part of 1591, unless you think I'm missing something.

[jira] Updated: (LUCENE-1575) Refactoring Lucene collectors (HitCollector and extensions)

2009-04-09 Thread Shai Erera (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-1575: --- Attachment: (was: LUCENE-1575.9.patch) > Refactoring Lucene collectors (HitCollector and extensi

[jira] Updated: (LUCENE-1575) Refactoring Lucene collectors (HitCollector and extensions)

2009-04-09 Thread Shai Erera (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-1575: --- Attachment: LUCENE-1575.9.patch added another test case to TestSort > Refactoring Lucene collectors

[jira] Updated: (LUCENE-1575) Refactoring Lucene collectors (HitCollector and extensions)

2009-04-09 Thread Shai Erera (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-1575: --- Attachment: LUCENE-1575.9.patch * Adds the ScoringNoMaxScore collectors * Adds some tests to TestSor

[jira] Commented: (LUCENE-1588) Update Spatial Lucene sort to use FieldComparatorSource

2009-04-09 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697718#action_12697718 ] Mark Miller commented on LUCENE-1588: - hmm - looks like we didnt make FieldComparatorS

[jira] Resolved: (LUCENE-861) Contrib queries package Query implementations do not override equals()

2009-04-09 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller resolved LUCENE-861. Resolution: Fixed Thanks Antony! > Contrib queries package Query implementations do not override e

[jira] Resolved: (LUCENE-1425) Add ConstantScore highlighting support to SpanScorer

2009-04-09 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller resolved LUCENE-1425. - Resolution: Fixed > Add ConstantScore highlighting support to SpanScorer > -

[jira] Resolved: (LUCENE-1587) RangeQuery equals method does not compare collator property fully

2009-04-09 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller resolved LUCENE-1587. - Resolution: Fixed Thanks Mark! > RangeQuery equals method does not compare collator property fu

[jira] Updated: (LUCENE-861) Contrib queries package Query implementations do not override equals()

2009-04-09 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated LUCENE-861: --- Fix Version/s: 2.9 > Contrib queries package Query implementations do not override equals() > ---

[jira] Assigned: (LUCENE-861) Contrib queries package Query implementations do not override equals()

2009-04-09 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller reassigned LUCENE-861: -- Assignee: Mark Miller > Contrib queries package Query implementations do not override equals()

Re: Filtering documents out of IndexReader

2009-04-09 Thread Michael McCandless
On Thu, Apr 9, 2009 at 7:02 PM, Jeremy Volkman wrote: > I'm sure I can extend my wrapping reader to also wrap whatever is returned > by getSequentialSubReaders, however all of what I'm writing is already done > by IndexReader with respect to deletions. What if, instead of throwing > UnsupportedOp

[jira] Commented: (LUCENE-1567) New flexible query parser

2009-04-09 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697688#action_12697688 ] Mark Miller commented on LUCENE-1567: - Having gone over this a bit, I think its a grea

Filtering documents out of IndexReader

2009-04-09 Thread Jeremy Volkman
Recently I found myself wanting an IndexReader to which I could say, "pretend like documents x, y and z are deleted, but don't really delete them." To do this, I extended FilterIndexReader and overrode TermDocs, TermPositions and various other methods to run the results of the wrapped reader past a

Re: Modularization

2009-04-09 Thread Earwin Burrfoot
On Fri, Apr 10, 2009 at 02:25, Chris Hostetter wrote: > Or just make it trivial to get all jars that fit a given profile w/o > actually merging those jars into an uber-jar ... does maven's > dependency management have any like "bundles" or "virtual packages" so > we could publish a "lucene-all-ana

Re: Modularization

2009-04-09 Thread Chris Hostetter
: If there are any serious moves to reorganize things, we should at least : consider the benefits of maven. +1 we can certainly do a lot to improve things just by refacting stuff from core into contrib, and improving the visibility of contribs and documentation about contribs -- but if we're

Re: Modularization

2009-04-09 Thread Chris Hostetter
: We've been doing this using just one source tree (like in Lucene), and : instead ensuring the separation using the build system. We did not, like you I think you are missunderstanding my previous comment ... Lucene-Java does not currenlty have one source tree in the sense that someone else su

Re: Modularization

2009-04-09 Thread Chris Hostetter
: Then during build we can package up certain combinations. I think : there should be sub-kitchen-sink jars by area, eg a jar that contains : all analyzers/tokenstreams/filters, all queries/filters, etc. Or just make it trivial to get all jars that fit a given profile w/o actually merging those

Re: possible TermInfosReader speedup

2009-04-09 Thread Mike Klaas
On 8-Apr-09, at 11:13 PM, Michael Busch wrote: I was thinking about doing this as part of LUCENE-1195. However, I doubt that the net win will be very noticeable here. A common scenario is that you have an index with one big body field that has a lot of unique terms, plus several metafield

[jira] Resolved: (LUCENE-1516) Integrate IndexReader with IndexWriter

2009-04-09 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1516. Resolution: Fixed I just committed this. Thanks Jason! > Integrate IndexReader w

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-09 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697540#action_12697540 ] Michael McCandless commented on LUCENE-831: --- {quote} I guess you meant dangerous

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-09 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697519#action_12697519 ] Mark Miller commented on LUCENE-831: bq. But yes both of them consume RAM, but I don't

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-09 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697495#action_12697495 ] Mark Miller commented on LUCENE-831: {quote} One massive array is far more dangerous du

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-09 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697492#action_12697492 ] Michael McCandless commented on LUCENE-831: --- bq. Is there much difference in one

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-09 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697485#action_12697485 ] Mark Miller commented on LUCENE-831: {quote} I'd like to see the new FieldCache API de-

[jira] Resolved: (LUCENE-1590) Stored-only fields automatically enable norms and tf when added to document

2009-04-09 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1590. Resolution: Fixed Thanks Uwe! > Stored-only fields automatically enable norms and

[jira] Commented: (LUCENE-1231) Column-stride fields (aka per-document Payloads)

2009-04-09 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697470#action_12697470 ] Michael McCandless commented on LUCENE-1231: {quote} To my mind, column stride

[jira] Commented: (LUCENE-1539) Improve Benchmark

2009-04-09 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697468#action_12697468 ] Michael McCandless commented on LUCENE-1539: This patch still has some noise,

[jira] Commented: (LUCENE-1575) Refactoring Lucene collectors (HitCollector and extensions)

2009-04-09 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697462#action_12697462 ] Michael McCandless commented on LUCENE-1575: bq. Eg, hitB is "favored" (lessTh

[jira] Commented: (LUCENE-1590) Stored-only fields automatically enable norms and tf when added to document

2009-04-09 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697457#action_12697457 ] Michael McCandless commented on LUCENE-1590: Thanks Uwe. I'll add a CHANGES e

[jira] Commented: (LUCENE-1593) Optimizations to TopScoreDocCollector and TopFieldCollector

2009-04-09 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697455#action_12697455 ] Michael McCandless commented on LUCENE-1593: See https://issues.apache.org/ji

[jira] Commented: (LUCENE-1575) Refactoring Lucene collectors (HitCollector and extensions)

2009-04-09 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697454#action_12697454 ] Michael McCandless commented on LUCENE-1575: bq. Does HitQueue favor documents

[jira] Commented: (LUCENE-1590) Stored-only fields automatically enable norms and tf when added to document

2009-04-09 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697452#action_12697452 ] Uwe Schindler commented on LUCENE-1590: --- I forgot to add a change-note in changes.tx

[jira] Updated: (LUCENE-1590) Stored-only fields automatically enable norms and tf when added to document

2009-04-09 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-1590: -- Attachment: LUCENE-1590.patch bq. Do you want to make a new patch (removing omit* update for b

[jira] Created: (LUCENE-1593) Optimizations to TopScoreDocCollector and TopFieldCollector

2009-04-09 Thread Shai Erera (JIRA)
Optimizations to TopScoreDocCollector and TopFieldCollector --- Key: LUCENE-1593 URL: https://issues.apache.org/jira/browse/LUCENE-1593 Project: Lucene - Java Issue Type: Improvement

[jira] Commented: (LUCENE-1313) Realtime Search

2009-04-09 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697444#action_12697444 ] Michael McCandless commented on LUCENE-1313: {quote} The test we need to progr

[jira] Commented: (LUCENE-1575) Refactoring Lucene collectors (HitCollector and extensions)

2009-04-09 Thread Shai Erera (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697442#action_12697442 ] Shai Erera commented on LUCENE-1575: Does HitQueue favor documents with smaller ids? I

[jira] Commented: (LUCENE-1590) Stored-only fields automatically enable norms and tf when added to document

2009-04-09 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697440#action_12697440 ] Michael McCandless commented on LUCENE-1590: bq. Maybe merge with the already

[jira] Commented: (LUCENE-1575) Refactoring Lucene collectors (HitCollector and extensions)

2009-04-09 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697439#action_12697439 ] Michael McCandless commented on LUCENE-1575: OK that sounds like a good plan.

Re: ArrayUtils.getNextSize

2009-04-09 Thread Shai Erera
Thanks Mike. On Thu, Apr 9, 2009 at 11:55 AM, Michael McCandless < luc...@mikemccandless.com> wrote: > On Wed, Apr 8, 2009 at 11:22 PM, Shai Erera wrote: > > Hi > > > > I used ArrayUtils.getNextSize recently to expand an array to a new size. > > When I read the documentation (the inline in the m

Re: ArrayUtils.getNextSize

2009-04-09 Thread Michael McCandless
On Wed, Apr 8, 2009 at 11:22 PM, Shai Erera wrote: > Hi > > I used ArrayUtils.getNextSize recently to expand an array to a new size. > When I read the documentation (the inline in the method), I saw this: > > "The growth pattern is:  0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ..." > > I'm not sure if I'

[jira] Commented: (LUCENE-1575) Refactoring Lucene collectors (HitCollector and extensions)

2009-04-09 Thread Shai Erera (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697437#action_12697437 ] Shai Erera commented on LUCENE-1575: Hey Mike. I actually planned to open another issu

[jira] Created: (LUCENE-1592) fix or deprecate TermsEnum.seek

2009-04-09 Thread Michael McCandless (JIRA)
fix or deprecate TermsEnum.seek --- Key: LUCENE-1592 URL: https://issues.apache.org/jira/browse/LUCENE-1592 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael

Re: possible TermInfosReader speedup

2009-04-09 Thread Michael McCandless
OK I opened https://issues.apache.org/jira/browse/LUCENE-1592. Mike On Thu, Apr 9, 2009 at 4:36 AM, Michael McCandless wrote: > On Thu, Apr 9, 2009 at 4:24 AM, Uwe Schindler wrote: > >> I think, we should do what was suggested in this thread: Remove it or >> deprecate it, if it is nowhere used

[jira] Commented: (LUCENE-1284) Set of Java classes that allow the Lucene search engine to use morphological information developed for the Apertium open-source machine translation platform (http://www

2009-04-09 Thread JIRA
[ https://issues.apache.org/jira/browse/LUCENE-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697434#action_12697434 ] Felipe Sánchez Martínez commented on LUCENE-1284: - Hi Otis, The package I

Re: possible TermInfosReader speedup

2009-04-09 Thread Michael McCandless
On Thu, Apr 9, 2009 at 4:24 AM, Uwe Schindler wrote: > I think, we should do what was suggested in this thread: Remove it or > deprecate it, if it is nowhere used internally to prevent people (like me in > the past) to try to use it. > > Maybe put an additional warning in the JavaDocs in addition

RE: possible TermInfosReader speedup

2009-04-09 Thread Uwe Schindler
> > Yes, if skipTo would work more performant, I could easily use it in > > TrieRange and would be happy as noted before. Currently, a new TermEnum > is > > created on each sub-range. When TrieRange was committed and therefore > > updated, for me it was (and still is) not clear, why skipTo may not