Re: Bug in StandardAnalyzer + StopAnalyzer?

2009-11-16 Thread Eran Sevi
I think it's always better to do what you can inside the code and don't leave it for the clients that calls the method (just code duplication and a potential place for error if it's forgotten). So if it's not complicated I think the call to reset() on the chain of filters should be the

Re: Efficient Query Evaluation using a Two-Level Retrieval Process

2009-11-16 Thread J. Delgado
Here is the link to the paper. http://cis.poly.edu/westlab/papers/cntdstrb/p426-broder.pdf A more recent application of the use and extension of the WAND operator for indexing of Boolean expressions: http://ilpubs.stanford.edu:8090/927/2/wand_vldb.pdf -- Joaquin On Sun, Nov 15, 2009 at 11:15

[jira] Resolved: (LUCENE-1154) System Reqs page should be release specific

2009-11-16 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler resolved LUCENE-1154. --- Resolution: Fixed Committed trunk changes revision: 880660 Committed site changes revision:

[jira] Updated: (LUCENE-1370) Patch to make ShingleFilter output a unigram if no ngrams can be generated

2009-11-16 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-1370: -- Fix Version/s: (was: 3.0) 3.1 I move this to 3.1 as it is a new

[jira] Commented: (LUCENE-1698) Change backwards-compatibility policy

2009-11-16 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12778262#action_12778262 ] Uwe Schindler commented on LUCENE-1698: --- This is the last issue blocking 3.0

[jira] Commented: (LUCENE-1698) Change backwards-compatibility policy

2009-11-16 Thread Michael Busch (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12778268#action_12778268 ] Michael Busch commented on LUCENE-1698: --- This doesn't need to block 3.0. We

Re: A new Lucene Directory available

2009-11-16 Thread Sergio Bossa
Sanne, I'd be very interested in knowing what kind of problems you analyzed and resolved regarding the Terracotta clustered solution, as quoted below: I know about the Terracotta efforts, I agree with you and have collected much feedback about which problems were arising directly talking with

Re: A new Lucene Directory available

2009-11-16 Thread Manik Surtani
@Sanne, thanks for announcing this, good stuff! @Earwin, note that this is a tech preview and hardly production-ready code yet. The more eyes that scan the code, try it out, report bugs and bottlenecks, the better. So thanks for spotting ISPN-276, we look forward to more feedback/patches.

Re: svn commit: r880704 - /lucene/java/trunk/CHANGES.txt

2009-11-16 Thread Michael McCandless
I like this consolidation, but: will ant changes-to-html be OK w/ multiple issues in one CHANGES entry? Mike On Mon, Nov 16, 2009 at 5:45 AM, uschind...@apache.org wrote: Author: uschindler Date: Mon Nov 16 10:45:31 2009 New Revision: 880704 URL:

[jira] Commented: (LUCENE-2051) Contrib Analyzer Setters should be deprecated and replace with ctor arguments

2009-11-16 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12778301#action_12778301 ] Uwe Schindler commented on LUCENE-2051: --- I committed it for you in revision: 880715

RE: svn commit: r880704 - /lucene/java/trunk/CHANGES.txt

2009-11-16 Thread Uwe Schindler
Thanks, I'll try out and fix if needed! Multiple issues in one issue is no problem, all of them are converted to links. Maybe only the Bullet list gets broken. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message-

[jira] Resolved: (LUCENE-2051) Contrib Analyzer Setters should be deprecated and replace with ctor arguments

2009-11-16 Thread Simon Willnauer (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer resolved LUCENE-2051. - Resolution: Fixed thanks uwe! Contrib Analyzer Setters should be deprecated and

[jira] Commented: (LUCENE-2047) IndexWriter should immediately resolve deleted docs to docID in near-real-time mode

2009-11-16 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12778335#action_12778335 ] Michael McCandless commented on LUCENE-2047: Thinking more on this... I'm

RE: svn commit: r880744 - in /lucene/java/trunk: CHANGES.txt src/site/changes/changes2html.pl

2009-11-16 Thread Steven A Rowe
Yeah, code.../code works to handle the bulleted lists (though it also adds pre.../pre tags - monospaced font, which is not ideal). changes-to-html should more gracefully deal with bulleted lists. - Steve On 11/16/2009 at 8:35 AM, uschind...@apache.org wrote: Author: uschindler Date: Mon Nov

[jira] Commented: (LUCENE-2061) Create benchmark approach for testing Lucene's near real-time performance

2009-11-16 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12778346#action_12778346 ] Michael McCandless commented on LUCENE-2061: OK the last test had a silly bug,

3.0 branch created, trunk changed to 3.1-dev

2009-11-16 Thread Uwe Schindler
Hallo Committers, I recently created the 3.0 branch and switched trunk to 3.1-dev (the website docs are the only missing things, hopefully). Please wait with heavy committing(R) to trunk, because: If we find any bugs in 3.0 RC, we should merge all these changes into trunk, too (which should not

RE: svn commit: r880744 - in /lucene/java/trunk: CHANGES.txt src/site/changes/changes2html.pl

2009-11-16 Thread Uwe Schindler
If you have a patch, I can apply it to 3.0 and trunk! Just open an issue, for me this was the quickest fix. Thanks, Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Steven A Rowe

RE: 3.0 branch created, trunk changed to 3.1-dev

2009-11-16 Thread Uwe Schindler
I forgot: Please also merge bugfixes that need to change BW branch between the two branches (lucene_3_0_bw and lucene_3_0), to keep consistent also for trunk! Thanks for all the development work, Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail:

Re: Bug in StandardAnalyzer + StopAnalyzer?

2009-11-16 Thread Robert Muir
Thanks for your feedback. I think we can improve this consistency with this issue: LUCENE-2034 In this issue Simon is proposing a subclass that makes it easier to reuse tokenstreams. and for the StandardTokenizer, we can consider removing this entire override, reset(Reader) calls reset(), there

[jira] Commented: (LUCENE-2034) Massive Code Duplication in Contrib Analyzers - unifly the analyzer ctors

2009-11-16 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12778363#action_12778363 ] Robert Muir commented on LUCENE-2034: - Simon, here {quote} source.reset(reader);

[jira] Commented: (LUCENE-1458) Further steps towards flexible indexing

2009-11-16 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12778364#action_12778364 ] Mark Miller commented on LUCENE-1458: - I've got a big merge coming - after a recent

[jira] Updated: (LUCENE-2067) Czech Stemmer

2009-11-16 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-2067: Attachment: LUCENE-2067.patch updated patch, fixed to use Version.LUCENE_31. I also added a note

[jira] Commented: (LUCENE-1458) Further steps towards flexible indexing

2009-11-16 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12778374#action_12778374 ] Uwe Schindler commented on LUCENE-1458: --- If you are merging, you should simplky

[jira] Commented: (LUCENE-1689) supplementary character handling

2009-11-16 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12778378#action_12778378 ] Robert Muir commented on LUCENE-1689: - a couple people have asked me about this issue

[jira] Created: (LUCENE-2068) fix reverseStringFilter for unicode 4.0

2009-11-16 Thread Robert Muir (JIRA)
fix reverseStringFilter for unicode 4.0 --- Key: LUCENE-2068 URL: https://issues.apache.org/jira/browse/LUCENE-2068 Project: Lucene - Java Issue Type: Improvement Components: contrib/analyzers

Re: [jira] Commented: (LUCENE-1689) supplementary character handling

2009-11-16 Thread Simon Willnauer
+1 On 11/16/09, Robert Muir (JIRA) j...@apache.org wrote: [ https://issues.apache.org/jira/browse/LUCENE-1689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12778378#action_12778378 ] Robert Muir commented on LUCENE-1689:

[jira] Updated: (LUCENE-2068) fix reverseStringFilter for unicode 4.0

2009-11-16 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-2068: Attachment: LUCENE-2068.patch fix reverseStringFilter for unicode 4.0

3.0.0-rc1 build, please check before I post to java-user

2009-11-16 Thread Uwe Schindler
Hi committers, I built and signed the Lucene Java 3.0.0-rc1 artifacts and placed them here: http://people.apache.org/~uschindler/staging-area/lucene-3.0.0-rc1/ Please check them quickly, if I did not do something completely wrong or something important is missing. The changes are in folder

RE: 3.0.0-rc1 build, please check before I post to java-user

2009-11-16 Thread Uwe Schindler
Sorry, I had a problem in CHANGES.txt, so I rebuilt and the artifacts. They are currently uploading to p.a.o. Please wait about 5-10 minutes before downloading. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message-

[jira] Updated: (LUCENE-2069) fix LowerCaseFilter for unicode 4.0

2009-11-16 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-2069: Attachment: LUCENE-2069.patch fix LowerCaseFilter for unicode 4.0

[jira] Created: (LUCENE-2069) fix LowerCaseFilter for unicode 4.0

2009-11-16 Thread Robert Muir (JIRA)
fix LowerCaseFilter for unicode 4.0 --- Key: LUCENE-2069 URL: https://issues.apache.org/jira/browse/LUCENE-2069 Project: Lucene - Java Issue Type: Improvement Components: Analysis

RE: svn commit: r880744 - in /lucene/java/trunk: CHANGES.txt src/site/changes/changes2html.pl

2009-11-16 Thread Steven A Rowe
Hi Uwe, I'll make an issue and post a patch tonight (I don't have one yet). Steve On 11/16/2009 at 9:22 AM, Uwe Schindler wrote: If you have a patch, I can apply it to 3.0 and trunk! Just open an issue, for me this was the quickest fix. Thanks, Uwe - Uwe Schindler

[jira] Updated: (LUCENE-2068) fix reverseStringFilter for unicode 4.0

2009-11-16 Thread Simon Willnauer (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-2068: Attachment: LUCENE_2068.patch Robert, I cleaned up the patch a little and did some minor

[jira] Commented: (LUCENE-1458) Further steps towards flexible indexing

2009-11-16 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12778393#action_12778393 ] Mark Miller commented on LUCENE-1458: - Simply ? :) What about the part where I have to

[jira] Commented: (LUCENE-2068) fix reverseStringFilter for unicode 4.0

2009-11-16 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12778394#action_12778394 ] Mark Miller commented on LUCENE-2068: - Is this an improvement or a bug? The summary

[jira] Commented: (LUCENE-2068) fix reverseStringFilter for unicode 4.0

2009-11-16 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12778395#action_12778395 ] Robert Muir commented on LUCENE-2068: - Simon, thanks! l agree we should use these

[jira] Commented: (LUCENE-2068) fix reverseStringFilter for unicode 4.0

2009-11-16 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12778396#action_12778396 ] Robert Muir commented on LUCENE-2068: - bq. Is this an improvement or a bug? The

[jira] Commented: (LUCENE-2068) fix reverseStringFilter for unicode 4.0

2009-11-16 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12778400#action_12778400 ] Robert Muir commented on LUCENE-2068: - bq. I guess this is good to go once 3.1 is out.

Re: Efficient Query Evaluation using a Two-Level Retrieval Process

2009-11-16 Thread J. Delgado
As I understood it setMinimumNumberShouldMatch(int min) Is used to specify a minimum number of the optional BooleanClauses which must be satisfied. I haven't seen the implementation of setMinimumNumberShouldMatch but it seems a bit different than what is intended with the WAND operator, which can

[jira] Commented: (LUCENE-2069) fix LowerCaseFilter for unicode 4.0

2009-11-16 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12778401#action_12778401 ] Robert Muir commented on LUCENE-2069: - Simon, if you have a moment maybe you can

[jira] Created: (LUCENE-2070) document LengthFilter wrt Unicode 4.0

2009-11-16 Thread Robert Muir (JIRA)
document LengthFilter wrt Unicode 4.0 - Key: LUCENE-2070 URL: https://issues.apache.org/jira/browse/LUCENE-2070 Project: Lucene - Java Issue Type: Improvement Components: Analysis

[jira] Updated: (LUCENE-2070) document LengthFilter wrt Unicode 4.0

2009-11-16 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-2070: Attachment: LUCENE-2070.patch document LengthFilter wrt Unicode 4.0

Re: Efficient Query Evaluation using a Two-Level Retrieval Process

2009-11-16 Thread Earwin Burrfoot
This algo is strictly tied to sort-by-score, if I understand it correctly. Lucene has queries and sorting decoupled (except for allowOutOfOrder mess), so implementing it would require some really fat hacks. On Mon, Nov 16, 2009 at 20:26, J. Delgado joaquin.delg...@gmail.com wrote: As I

[jira] Commented: (LUCENE-1689) supplementary character handling

2009-11-16 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12778421#action_12778421 ] Robert Muir commented on LUCENE-1689: - Yonik, or anyone else, please let me know your

Re: Efficient Query Evaluation using a Two-Level Retrieval Process

2009-11-16 Thread J. Delgado
On Mon, Nov 16, 2009 at 9:44 AM, Earwin Burrfoot ear...@gmail.com wrote: This algo is strictly tied to sort-by-score, if I understand it correctly. Lucene has queries and sorting decoupled (except for allowOutOfOrder mess), so implementing it would require some really fat hacks. According to

[jira] Commented: (LUCENE-1689) supplementary character handling

2009-11-16 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12778427#action_12778427 ] Robert Muir commented on LUCENE-1689: - btw, its worth mentioning that this whole

Why release 3.0?

2009-11-16 Thread Erick Erickson
One of my specialties is asking obvious questions just to see if everyone's assumptions are aligned. So with the discussion about branching 3.0 I have to ask Is there going to be any 3.0 release intended for *production*?. And if not, would we save a lot of work by just not worrying about

Re: Why release 3.0?

2009-11-16 Thread Jake Mannix
Don't users need to upgrade to 3.0 because 3.1 won't be necessarily able to read your 2.4 index file formats? I suppose if you've already upgraded to 2.9, then all is well because 2.9 is the same format as 3.0, but we can't assume all users upgraded from 2.4 to 2.9. If you've done that already,

[jira] Commented: (LUCENE-2047) IndexWriter should immediately resolve deleted docs to docID in near-real-time mode

2009-11-16 Thread Jason Rutherglen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12778452#action_12778452 ] Jason Rutherglen commented on LUCENE-2047: -- I want to replay how DW handle the

[jira] Created: (LUCENE-2071) Allow upating of IndexWriter SegmentReaders

2009-11-16 Thread Jason Rutherglen (JIRA)
Allow upating of IndexWriter SegmentReaders --- Key: LUCENE-2071 URL: https://issues.apache.org/jira/browse/LUCENE-2071 Project: Lucene - Java Issue Type: Improvement Components: Index

RE: Why release 3.0?

2009-11-16 Thread Uwe Schindler
Hi Erick, 3.0 is *not* unsupported or beta release, it is the cleaned up 2.9.1 release. You are right, it is not needed for 2.9.1 users to upgrade (but they can), but for new users starting with Lucene, the recommendadion is to use it and not 2.9. 3.0 also contains some cleanups needed for

RE: Why release 3.0?

2009-11-16 Thread Uwe Schindler
2.9 has *not* the same format as 3.0, an index created with 3.0 cannot be read with 2.9. This is because compressed field support was removed and therefore the version number of the stored fields file was upgraded. But indexes from 2.9 can be read with 3.0 and support may get removed in 4.0. 3.0

Re: Why release 3.0?

2009-11-16 Thread Jake Mannix
Yeah, sorry, I just meant that 3.0 can read 2.9 index format, but 3.1 will not necessarily have that capability (this is the whole point of the difference between 2.9 and 3.0, in my understanding). On Mon, Nov 16, 2009 at 11:05 AM, Uwe Schindler u...@thetaphi.de wrote: 2.9 has **not** the same

Re: Why release 3.0?

2009-11-16 Thread Robert Muir
uwe, on topic please read my comment on LUCENE-1689, because unicode version was bumped in jdk 1.5, i believe this index backwards compatibility is only theoretical On Mon, Nov 16, 2009 at 2:05 PM, Uwe Schindler u...@thetaphi.de wrote: 2.9 has **not** the same format as 3.0, an index created

RE: Why release 3.0?

2009-11-16 Thread Uwe Schindler
But an UTF-8 stream from Java 4 can still be read with Java 5, what is the problem? Java 5 extended Unicode support, but an index created with older versions can still be read. UTF-8 is standardized. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail:

Re: Why release 3.0?

2009-11-16 Thread Robert Muir
the problem is that the properties have changed for various characters, and new characters were added. it really has nothing to do with lucene, but the idea you can go from jdk 1.4/lucene 2.9 to jdk 1.5/lucene3.0 without reindexing is not true. On Mon, Nov 16, 2009 at 2:12 PM, Uwe Schindler

[jira] Updated: (LUCENE-2071) Allow upating of IndexWriter SegmentReaders

2009-11-16 Thread Jason Rutherglen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Rutherglen updated LUCENE-2071: - Attachment: LUCENE-2071.patch Added an IW updateReaders method that accepts a Readers

[jira] Updated: (LUCENE-2071) Allow updating of IndexWriter SegmentReaders

2009-11-16 Thread Jason Rutherglen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Rutherglen updated LUCENE-2071: - Summary: Allow updating of IndexWriter SegmentReaders (was: Allow upating of

RE: Why release 3.0?

2009-11-16 Thread Steven A Rowe
Hi Robert, I agree that the Unicode version supported by the JVM, as you say, really has nothing to do with Lucene. The disruption here is users' upgrading from Java 1.4 to 1.5+, not when they upgrade Lucene. I'd guess with few exceptions that most people have been using Lucene with 1.5+ for

Re: Why release 3.0?

2009-11-16 Thread Mark Miller
X.n must be able to read (X-1).n - so 3.1 will be able to read 2.9 - major versions are also for removing deprecations. Jake Mannix wrote: Yeah, sorry, I just meant that 3.0 can read 2.9 index format, but 3.1 will not necessarily have that capability (this is the whole point of the difference

Re: Why release 3.0?

2009-11-16 Thread Robert Muir
right, its nothing to do with lucene, instead due to property changes, etc. i just think we should inform users on java 1.4/2.9 that if they upgrade to java 1.5/3.0, they should reindex. the reason i say this about properties, is there are some that change that will affect tokenizers, i give two

RE: Why release 3.0?

2009-11-16 Thread Uwe Schindler
But most people already use 1.5 or 1.6 even with 2.9. They could also switch before. The problem is the used JVM not the used Lucene Version. And you can also run Lucene 1.4.3 with Java 5 - same problem. If people change their Java Version, they have to take care what changed. The only thing:

RE: Why release 3.0?

2009-11-16 Thread Uwe Schindler
We tried out: Character.getType() for these two chars: Java 5: '\u00AD' = 16 '\u06DD' = 16 Java 1.4: '\u00AD' = 20 '\u06DD' = 7 The first is the soft hyphen. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de _ From: Robert

Re: Why release 3.0?

2009-11-16 Thread Robert Muir
right, my point is its true its nothing to do with Lucene at all, really. but the reality is we should clarify this to users I think. Its especially complex in the current StandardTokenizer, which uses a mix of hardcoded ranges and properties, can you tell me if you should reindex for given

RE: Why release 3.0?

2009-11-16 Thread Uwe Schindler
But it is a general warning that should be placed in the Wiki: If you upgrade from Java 1.4 to Java 5, think about reindexing. It has definitely nothing to do with 3.0, because uses could have changed (and most of them have) before. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen

Re: Why release 3.0?

2009-11-16 Thread Robert Muir
Uwe, thats probably a good solution I think. just as long as we document somewhere, I think there is some warning verbage in StandardTokenizer already about this. NOTE: if you change StandardTokenizerImpl.jflex and need to regenerate the tokenizer, remember to use JRE 1.4 to run jflex

Re: Why release 3.0?

2009-11-16 Thread Robert Muir
btw, so heres a great example. you are backwards broken regardless of JVM for StandardTokenizer, because we used 1.4 JRE to run jflex in 2.9, but 1.5 in 3.0, right? On Mon, Nov 16, 2009 at 2:51 PM, Robert Muir rcm...@gmail.com wrote: Uwe, thats probably a good solution I think. just as long as

[jira] Commented: (LUCENE-2068) fix reverseStringFilter for unicode 4.0

2009-11-16 Thread Simon Willnauer (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12778487#action_12778487 ] Simon Willnauer commented on LUCENE-2068: - bq. I just think we should use a

[jira] Updated: (LUCENE-2068) fix reverseStringFilter for unicode 4.0

2009-11-16 Thread Simon Willnauer (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-2068: Attachment: LUCENE_2068.patch removed static import fix reverseStringFilter for unicode

[jira] Assigned: (LUCENE-2068) fix reverseStringFilter for unicode 4.0

2009-11-16 Thread Simon Willnauer (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer reassigned LUCENE-2068: --- Assignee: Simon Willnauer fix reverseStringFilter for unicode 4.0

RE: Why release 3.0?

2009-11-16 Thread Uwe Schindler
I think the regenerated code in Standard is since years no longer generated with 1.4 :-) Most developers use 1.5 or even 1.6. So it already changed incompatible. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de _ From: Robert

RE: Why release 3.0?

2009-11-16 Thread Uwe Schindler
JFlex was not regenerated as far as I know, but if somebody did, its already broken. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de _ From: Robert Muir [mailto:rcm...@gmail.com] Sent: Monday, November 16, 2009 8:53 PM To:

Re: Why release 3.0?

2009-11-16 Thread Robert Muir
i suppose we are ok then, except for the fact that now StandardTokenizer is working with a unicode 3.0 definition, instead of the unicode version (4.0) that corresponds to our required minimum jre (1.5)... sorry if i raised a stink about nothing, but you see my concerns maybe? On Mon, Nov 16,

Re: Why release 3.0?

2009-11-16 Thread Erick Erickson
On Mon, Nov 16, 2009 at 2:03 PM, Uwe Schindler u...@thetaphi.de wrote: Hi Erick, 3.0 is **not** unsupported or beta release, it is the cleaned up 2.9.1 release. You are right, it is not needed for 2.9.1 users to upgrade (but they can), but for new users starting with Lucene, the

Re: Why release 3.0?

2009-11-16 Thread Erick Erickson
Oops, stupid mouse made me send a blank message. Ok, I withdraw the question since there *are* good reasons to put 3.0 in a prod environment G. It's also an easier thing to say new Lucene users should start with 3.0 rather than new Lucene users should start with 3.1. Use 3.0 until we release 3.1

RE: Why release 3.0?

2009-11-16 Thread Uwe Schindler
I have to regenerate the JFlex files to be sure that they are Java 5. Should I do and recreate the artifacts, they are not yet released. Correct would be to copy the current generated Java file and use it if matchVersion Version.LUCENE_30. For 3.0++ we have a new one. If the old one is really

Re: Why release 3.0?

2009-11-16 Thread Robert Muir
Steven, I think we can be almost sure of no latin-1 changes. what do you think about this jflex situation though? it seems like a mess, is there anything we can do before the jflex 1.5 stuff that is going on now (where we could actually link Version to the unicode version jflex uses explicitly?)

RE: Why release 3.0?

2009-11-16 Thread Uwe Schindler
We support 3.0, why do you tend to say something other? I will always fix the bug first in 3.0 and then merge (perhaps) back to 2.9. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de _ From: Erick Erickson

Re: Why release 3.0?

2009-11-16 Thread Mark Miller
This is a big deal, weather its jdk or Lucene related. We are forcing those on 1.4 to move to 1.5 - any problems you face with that with the JDK are Lucene problems if they affect Lucene. We need big clear warnings about this - we should have had them before we pushed to users to 1.5 as well if I

RE: Why release 3.0?

2009-11-16 Thread Uwe Schindler
I would rename the java file/class and write a big warning on it: for version 3.0. Do not recreate (which cannot be done, because jflex file is missing). The current jflex file is recreated and is now the official support 1.5 version. The 1.4 version will never change! - Uwe Schindler

Re: Why release 3.0?

2009-11-16 Thread Mark Miller
Good point - and that likely means the current warning is not working - what can we do to improve it? Perhaps a new text file called jflexregen or something, and it specifically says you must use java 1.5? Uwe Schindler wrote: I think the regenerated code in Standard is since years no longer

[jira] Commented: (LUCENE-2069) fix LowerCaseFilter for unicode 4.0

2009-11-16 Thread Simon Willnauer (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12778504#action_12778504 ] Simon Willnauer commented on LUCENE-2069: - Robert, I assume you did use those

RE: Why release 3.0?

2009-11-16 Thread Uwe Schindler
I check this by generating the file with 1.4 and 1.5. The 1.4 version will not change anymore, so we just leave the java file no jflex anymore. The old one is used for Lucene until 2.9, if you use matchVersion=LUCENE_30, the new one is used, which can also be regenerated. - Uwe Schindler

[jira] Commented: (LUCENE-2068) fix reverseStringFilter for unicode 4.0

2009-11-16 Thread Simon Willnauer (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12778507#action_12778507 ] Simon Willnauer commented on LUCENE-2068: - We will get this in once 3.0 is out. I

Re: Why release 3.0?

2009-11-16 Thread Robert Muir
mark these are similar to my concerns with us doing unicode 4.0 (suppl. characters, etc) support in 3.1. this is why i left a comment on LUCENE-1689, I'm pretty confused about what approach we should take, because technically, fixing this will break things. and again, I do believe we should have

[jira] Commented: (LUCENE-2069) fix LowerCaseFilter for unicode 4.0

2009-11-16 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12778508#action_12778508 ] Robert Muir commented on LUCENE-2069: - Simon, those wierd chars are indeed real

Re: Why release 3.0?

2009-11-16 Thread Mark Miller
And what happens when someone regenerates it with 1.6 without knowing? Uwe Schindler wrote: I check this by generating the file with 1.4 and 1.5. The 1.4 version will not change anymore, so we just leave the java file no jflex anymore. The old one is used for Lucene until 2.9, if you use

[jira] Commented: (LUCENE-2069) fix LowerCaseFilter for unicode 4.0

2009-11-16 Thread Simon Willnauer (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12778509#action_12778509 ] Simon Willnauer commented on LUCENE-2069: - we might need a changes.txt entry here

RE: Why release 3.0?

2009-11-16 Thread Uwe Schindler
Did 1.6 change the unicode version? Robert? - UWE SCHINDLER Webserver/Middleware Development PANGAEA - Publishing Network for Geoscientific and Environmental Data MARUM - University of Bremen Room 2500, Leobener Str., D-28359 Bremen Tel.: +49 421 218 65595 Fax: +49 421 218 65505

[jira] Commented: (LUCENE-2069) fix LowerCaseFilter for unicode 4.0

2009-11-16 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12778510#action_12778510 ] Robert Muir commented on LUCENE-2069: - Simon, yes see LUCENE-1689. this is my

Re: Why release 3.0?

2009-11-16 Thread Robert Muir
no. its still 4.0, but i hear 1.7 will be 5.1 or 5.2 the only way to truly control this, would be to use something like ICU to control the unicode version being used (and actually be faster, and support higher version). see http://site.icu-project.org/home/why-use-icu4j the issue is that lucene

[jira] Created: (LUCENE-2072) Upgrade contrib/regex to jakarta-regex 1.5

2009-11-16 Thread Simon Willnauer (JIRA)
Upgrade contrib/regex to jakarta-regex 1.5 --- Key: LUCENE-2072 URL: https://issues.apache.org/jira/browse/LUCENE-2072 Project: Lucene - Java Issue Type: Improvement Components: contrib/*

[jira] Updated: (LUCENE-2072) Upgrade contrib/regex to jakarta-regex 1.5

2009-11-16 Thread Simon Willnauer (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-2072: Attachment: jakarta-regexp-1.5.jar LUCENE-2072.patch Upgrade

[jira] Commented: (LUCENE-2069) fix LowerCaseFilter for unicode 4.0

2009-11-16 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12778515#action_12778515 ] Robert Muir commented on LUCENE-2069: - Uwe, we can use matchVersion for all of this,

[jira] Assigned: (LUCENE-2072) Upgrade contrib/regex to jakarta-regex 1.5

2009-11-16 Thread Simon Willnauer (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer reassigned LUCENE-2072: --- Assignee: Simon Willnauer Upgrade contrib/regex to jakarta-regex 1.5

[jira] Commented: (LUCENE-1689) supplementary character handling

2009-11-16 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12778516#action_12778516 ] Mark Miller commented on LUCENE-1689: - If there is nothing we can do here, then we

Re: Why release 3.0?

2009-11-16 Thread Mark Miller
I still reccomend we add a file then HowToRegenJflex.txt or something - that specifically says to use 1.5 or 1.6. I don't changing the current notice/warning is visible enough to ensure someone doesn't break this. Robert Muir wrote: no. its still 4.0, but i hear 1.7 will be 5.1 or 5.2 the only

[jira] Commented: (LUCENE-1689) supplementary character handling

2009-11-16 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12778524#action_12778524 ] Robert Muir commented on LUCENE-1689: - bq. If there is nothing we can do here, then we

[jira] Commented: (LUCENE-1689) supplementary character handling

2009-11-16 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12778526#action_12778526 ] Mark Miller commented on LUCENE-1689: - I'm speaking in regards to: {quote} btw, its

[jira] Commented: (LUCENE-1689) supplementary character handling

2009-11-16 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12778527#action_12778527 ] Robert Muir commented on LUCENE-1689: - bq. We can fix that too? If so, I think we

  1   2   >