[jira] Commented: (LUCENE-1591) Enable bzip compression in benchmark

2009-05-22 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12711985#action_12711985 ] Uwe Schindler commented on LUCENE-1591: --- Commons-Compress 1.0 is now released, we

[jira] Commented: (LUCENE-1591) Enable bzip compression in benchmark

2009-05-22 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12712002#action_12712002 ] Michael McCandless commented on LUCENE-1591: Excellent! Yes I think so?

[jira] Commented: (LUCENE-1636) TokenFilters with a null value in the constructor fail

2009-05-22 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12712008#action_12712008 ] Michael McCandless commented on LUCENE-1636: Good questions Uwe! I tested the

[jira] Commented: (LUCENE-1591) Enable bzip compression in benchmark

2009-05-22 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12712019#action_12712019 ] Uwe Schindler commented on LUCENE-1591: --- I replaced the dev version by 1.0 and it

[jira] Commented: (LUCENE-1591) Enable bzip compression in benchmark

2009-05-22 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12712020#action_12712020 ] Uwe Schindler commented on LUCENE-1591: --- Committed revision 777458. Enable bzip

Re: Lucene's default settings back compatibility

2009-05-22 Thread Earwin Burrfoot
A funny thought: we can give those methods/classes really stupid/nasty names, to emphasize the beauty of the existing API, to encourage people to stick with the better API :) I believe I've seen google using internally names like thisisbadbadbadInstanceMap. :) One thing we didn't address

[jira] Commented: (LUCENE-1636) TokenFilters with a null value in the constructor fail

2009-05-22 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12712021#action_12712021 ] Uwe Schindler commented on LUCENE-1636: --- Thanks, I am still in Japan and had no time

[jira] Commented: (LUCENE-1636) TokenFilters with a null value in the constructor fail

2009-05-22 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12712025#action_12712025 ] Uwe Schindler commented on LUCENE-1636: --- Oh, you already committed this :)

[jira] Updated: (LUCENE-1542) NearSpansUnordered.getPayload does not always return the correct payloads when terms are located at the same position

2009-05-22 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1542: --- Fix Version/s: 2.9 NearSpansUnordered.getPayload does not always return the

Re: Lucene's default settings back compatibility

2009-05-22 Thread Matthew Hall
Earwin Burrfoot wrote: As I said, my app uses around ten indexes, which one should I use? :) Even more here, this would be a reasonably painful solution for us. Matt - To unsubscribe, e-mail:

[jira] Commented: (LUCENE-1636) TokenFilters with a null value in the constructor fail

2009-05-22 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12712076#action_12712076 ] Michael McCandless commented on LUCENE-1636: OK I'll add a null check in

Re: Lucene's default settings back compatibility

2009-05-22 Thread Grant Ingersoll
Perhaps it is wise to take a step back before we play all of these what if games... I think the best way forward is to simply ask ourselves, when confronted with an actual issue, is what is the cost of back compat. for this issue and then address it on a case by case basis, with a bias

[jira] Commented: (LUCENE-1313) Realtime Search

2009-05-22 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12712082#action_12712082 ] Michael McCandless commented on LUCENE-1313: I think generally we are close.

[jira] Commented: (LUCENE-1474) Incorrect SegmentInfo.delCount when IndexReader.flush() is used

2009-05-22 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12712084#action_12712084 ] Michael McCandless commented on LUCENE-1474: Thanks Erik. Can you answer my

[jira] Resolved: (LUCENE-1636) TokenFilters with a null value in the constructor fail

2009-05-22 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1636. Resolution: Won't Fix TokenFilters with a null value in the constructor fail

[jira] Reopened: (LUCENE-1636) TokenFilters with a null value in the constructor fail

2009-05-22 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reopened LUCENE-1636: TokenFilters with a null value in the constructor fail

Re: svn commit: r777525 - /lucene/java/trunk/src/java/org/apache/lucene/util/AttributeSource.java

2009-05-22 Thread Michael McCandless
In general I agree, but in this case I think the check is warranted because it used to be fine (in 2.4) to pass null -- nothing bad would happen. But as of the new TokenStream API, you'll suddenly hit an NPE, so I think we should throw an informed exception so it's clear to users what used to be

Re: Lucene's default settings back compatibility

2009-05-22 Thread Michael McCandless
OK it sounds like a single global actsAsVersion is too problematic. So how about, for cases where back compat default settings are important (analyzers, query scoring changes, etc.) we add actsAsVersion as a mandatory ctor argument to those classes (deprecating the other ctors)? We would do this

Re: Lucene's default settings back compatibility

2009-05-22 Thread Michael McCandless
On Thu, May 21, 2009 at 6:53 PM, Marvin Humphrey mar...@rectangular.com wrote: Lastly, I think a major java Lucene release is justified already. Won't this discussion die down somewhat if you can get 3.0 out? Somewhat, yes, but then when working on 3.1 if we make some great improvement, I'd

Re: Lucene's default settings back compatibility

2009-05-22 Thread Michael McCandless
So, iterating on the proposed changes to back-compat policy: 1. If we deprecate an API in the 2.1 release, we can remove it in the next minor release (2.2). 2. JAR drop-in-ability is only guaranteed on point releases (2.4.1 is a drop-in replacement to 2.4.0). When switching to a

Re: Lucene's default settings back compatibility

2009-05-22 Thread Earwin Burrfoot
 1. If we deprecate an API in the 2.1 release, we can remove it in     the next minor release (2.2). Agree. Maybe also this? 1a. If deprecated functionality is trivially implemented with new one, we reserve the right to delete deprecated things right away with appropriate CHANGES note. Sample I:

Re: Lucene's default settings back compatibility

2009-05-22 Thread Marvin Humphrey
On Fri, May 22, 2009 at 11:53:02AM -0400, Michael McCandless wrote: 1. If we deprecate an API in the 2.1 release, we can remove it in the next minor release (2.2). 2. JAR drop-in-ability is only guaranteed on point releases (2.4.1 is a drop-in replacement to 2.4.0). When

Re: Lucene's default settings back compatibility

2009-05-22 Thread Yonik Seeley
I'm not a lawyer, so I dislike trying to nail down every detail in writing and try to solve future problems in the abstract. Lucene has never really been 100% back compatible... we've just tried to keep it that way... it's more of a mindset than a reality, and I'm wary of changing that mindset

Re: Lucene's default settings back compatibility

2009-05-22 Thread Marvin Humphrey
On Fri, May 22, 2009 at 11:33:33AM -0400, Michael McCandless wrote: when working on 3.1 if we make some great improvement, I'd like new users in 3.1 to see the improvement by default. Sounds like an argument for more frequent major releases. But I'm not exactly one to talk. ;) On

Re: Lucene's default settings back compatibility

2009-05-22 Thread Earwin Burrfoot
In KinoSearch SVN trunk, satellite classes like QueryParser and Highlighter have to be passed a Schema, which contains all the Analyzers.  Analyzers aren't satellite classes under this model -- they are a fixed property of a FullTextType field spec.  Think of them as baked into an SQL field

Re: Lucene's default settings back compatibility

2009-05-22 Thread Michael McCandless
On Fri, May 22, 2009 at 12:52 PM, Marvin Humphrey mar...@rectangular.com wrote: when working on 3.1 if we make some great improvement, I'd like new users in 3.1 to see the improvement by default. Sounds like an argument for more frequent major releases. Yeah. Or rebranding what we now call

Re: Lucene's default settings back compatibility

2009-05-22 Thread Yonik Seeley
On Fri, May 22, 2009 at 1:22 PM, Michael McCandless luc...@mikemccandless.com wrote: (That said, unrelated to this discussion, I would actually like to record per-segment which version of Lucene wrote the segment; this would be very helpful when debugging issues like LUCENE-1474 where I need

Re: Lucene's default settings back compatibility

2009-05-22 Thread Michael McCandless
On Fri, May 22, 2009 at 12:37 PM, Marvin Humphrey mar...@rectangular.com wrote: I still like per-class settings classes. For instance, an IndexWriterSettings class which allows you to hide away all the tweaky stuff that's cluttering up the IndexWriter API. IndexWriterSettings settings =

[jira] Updated: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries

2009-05-22 Thread Ali Oral (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ali Oral updated LUCENE-1486: - Comment: was deleted (was: This issue is very interesting. I see that you use query rewrite for

Re: Lucene's default settings back compatibility

2009-05-22 Thread DM Smith
Yonik Seeley wrote: On Fri, May 22, 2009 at 1:22 PM, Michael McCandless luc...@mikemccandless.com wrote: (That said, unrelated to this discussion, I would actually like to record per-segment which version of Lucene wrote the segment; this would be very helpful when debugging issues like

Re: Lucene's default settings back compatibility

2009-05-22 Thread Marvin Humphrey
I feel the opposite: I'd like new users to see improvements by default, and users that require strict back-compate to ask for that. By strict back-compat, do you mean people who would like their search app to not fail silently? ;) A new user who follows your advice... // haha stupid noob

Re: Lucene's default settings back compatibility

2009-05-22 Thread Michael McCandless
On Fri, May 22, 2009 at 12:44 PM, Yonik Seeley yo...@lucidimagination.com wrote: I'm not a lawyer, so I dislike trying to nail down every detail in writing and try to solve future problems in the abstract. Agreed, and there's always leeway in what we work out here (LUCENE-1436 is a good recent

Re: Lucene's default settings back compatibility

2009-05-22 Thread Marvin Humphrey
On Fri, May 22, 2009 at 01:22:24PM -0400, Michael McCandless wrote: Sounds like an argument for more frequent major releases. Yeah. Or rebranding what we now call minor as major releases, by changing our policy ;) Not sure how much of that is a jest, bug I don't think that's a good idea.

Re: Lucene's default settings back compatibility

2009-05-22 Thread DM Smith
Michael McCandless wrote: On Fri, May 22, 2009 at 12:52 PM, Marvin Humphrey mar...@rectangular.com wrote: when working on 3.1 if we make some great improvement, I'd like new users in 3.1 to see the improvement by default. Sounds like an argument for more frequent major releases.

Re: Lucene's default settings back compatibility

2009-05-22 Thread Marvin Humphrey
On Fri, May 22, 2009 at 09:06:32PM +0400, Earwin Burrfoot wrote: In KinoSearch SVN trunk, satellite classes like QueryParser and Highlighter have to be passed a Schema, which contains all the Analyzers.  Analyzers aren't satellite classes under this model -- they are a fixed property of a

Re: Lucene's default settings back compatibility

2009-05-22 Thread DM Smith
Marvin Humphrey wrote: I feel the opposite: I'd like new users to see improvements by default, and users that require strict back-compate to ask for that. By strict back-compat, do you mean people who would like their search app to not fail silently? ;) A new user who follows your

Re: Lucene's default settings back compatibility

2009-05-22 Thread Michael McCandless
On Fri, May 22, 2009 at 2:27 PM, DM Smith dmsmith...@gmail.com wrote: Marvin Humphrey wrote: I feel the opposite: I'd like new users to see improvements by default, and users that require strict back-compate to ask for that. By strict back-compat, do you mean people who would like their

Re: Lucene's default settings back compatibility

2009-05-22 Thread Earwin Burrfoot
Custom analyzers. No problem. How are they recorded in the index? Several indexes using the same analyzer. No problem.  Only necessary if the analyzer is costly or has some esoteric need for shared state.  And possible via subclassing Schema or Analyzer. It is. Intentionally different

Re: Lucene's default settings back compatibility

2009-05-22 Thread Michael McCandless
OK, net/net it doesn't look like we're going reach agreement on some general approach for having users of Lucene always get the best default settings. We started with the *Settings classes, but that's really a very large project (goes far beyond managing defaults for new users). Then we went to

Re: Lucene's default settings back compatibility

2009-05-22 Thread DM Smith
Michael McCandless wrote: On Fri, May 22, 2009 at 2:27 PM, DM Smith dmsmith...@gmail.com wrote: Marvin Humphrey wrote: I feel the opposite: I'd like new users to see improvements by default, and users that require strict back-compate to ask for that. By strict back-compat,

Re: Lucene's default settings back compatibility

2009-05-22 Thread Michael McCandless
I'd like to do this for 2.9 :) I'll open an issue... (Yes this would just be for diagnostics). Mike On Fri, May 22, 2009 at 1:48 PM, DM Smith dmsmith...@gmail.com wrote: Yonik Seeley wrote: On Fri, May 22, 2009 at 1:22 PM, Michael McCandless luc...@mikemccandless.com wrote: (That said,

Re: Lucene's default settings back compatibility

2009-05-22 Thread Michael McCandless
Well... I would expect hope Lucene's adoption is growing with time, so the number of new users should increase on each release. For a healthy project that's relatively young compared to its potential user base, that growth should be exponential. And, I'd expect the vast majority of old users

[jira] Created: (LUCENE-1654) Include diagnostics per-segment when writing a new segment

2009-05-22 Thread Michael McCandless (JIRA)
Include diagnostics per-segment when writing a new segment -- Key: LUCENE-1654 URL: https://issues.apache.org/jira/browse/LUCENE-1654 Project: Lucene - Java Issue Type: Improvement

Re: Lucene's default settings back compatibility

2009-05-22 Thread DM Smith
Michael McCandless wrote: Well... I would expect hope Lucene's adoption is growing with time, so the number of new users should increase on each release. For a healthy project that's relatively young compared to its potential user base, that growth should be exponential. And, I'd expect the

Re: Lucene's default settings back compatibility

2009-05-22 Thread Marvin Humphrey
On Fri, May 22, 2009 at 10:40:03PM +0400, Earwin Burrfoot wrote: Custom analyzers. No problem. How are they recorded in the index? Analyzers must implement dump() and load(), which convert the Analyzer to/from a JSON-izable data structure. They end up as JSON in index_dir/schema_NNN.json.

Re: Lucene's default settings back compatibility

2009-05-22 Thread Michael McCandless
On Fri, May 22, 2009 at 3:37 PM, DM Smith dmsmith...@gmail.com wrote: So, what is it that they use that leads to such unfavorable results? I think it's simply that they take each search engine, get it to index their collection in the most obvious way, perhaps having read a tutorial somewhere,

[jira] Commented: (LUCENE-1654) Include diagnostics per-segment when writing a new segment

2009-05-22 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12712269#action_12712269 ] Earwin Burrfoot commented on LUCENE-1654: - Let's have string key-value pairs

[jira] Issue Comment Edited: (LUCENE-1654) Include diagnostics per-segment when writing a new segment

2009-05-22 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12712269#action_12712269 ] Earwin Burrfoot edited comment on LUCENE-1654 at 5/22/09 2:26 PM:

[jira] Commented: (LUCENE-1460) Change all contrib TokenStreams/Filters to use the new TokenStream API

2009-05-22 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12712295#action_12712295 ] Robert Muir commented on LUCENE-1460: - is anyone working on this? I have some