RE: Problem with CharStream and Tokenizers with custom reset(Reader) method

2009-09-11 Thread Uwe Schindler
I do not know, how this could affect Solr, but it could be the case. Currently most Tokenizers do not use CharStreams at all. After committing LUCENE-1906, I think there is also some additional work in Solr's custom Tokenizers needed (changed the correctOffset method). - Uwe Schindler

Problem with CharStream and Tokenizers with custom reset(Reader) method

2009-09-10 Thread Uwe Schindler
When reviewing the new CharStream code added to Tokenizers, I found a serious problem with backwards compatibility and other Tokenizers, that do not override reset(CharStream). The problem is, that e.g. CharTokenizer only overrides reset(Reader): public void reset(Reader input) throws

RE: Problem with CharStream and Tokenizers with custom reset(Reader) method

2009-09-10 Thread Uwe Schindler
I tested the attached patch, all tests still compile and work as exspected (as CharStream extends Reader). I think I should open an issue? Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Uwe

Re: Problem with CharStream and Tokenizers with custom reset(Reader) method

2009-09-10 Thread Mark Miller
Yeah, lets open an issue and mark it blocker - I'll hold RC4 for it (was just about to push it when I caught this email). Uwe Schindler wrote: I tested the attached patch, all tests still compile and work as exspected (as CharStream extends Reader). I think I should open an issue? Uwe

[jira] Created: (LUCENE-1906) Problem with CharStream and Tokenizers with custom reset(Reader) method

2009-09-10 Thread Uwe Schindler (JIRA)
Problem with CharStream and Tokenizers with custom reset(Reader) method --- Key: LUCENE-1906 URL: https://issues.apache.org/jira/browse/LUCENE-1906 Project: Lucene - Java

[jira] Updated: (LUCENE-1906) Problem with CharStream and Tokenizers with custom reset(Reader) method

2009-09-10 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-1906: -- Description: When reviewing the new CharStream code added to Tokenizers, I found a serious

[jira] Commented: (LUCENE-1906) Problem with CharStream and Tokenizers with custom reset(Reader) method

2009-09-10 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12753664#action_12753664 ] Uwe Schindler commented on LUCENE-1906: --- I will now check, if the change of the

[jira] Updated: (LUCENE-1906) Problem with CharStream and Tokenizers with custom reset(Reader) method

2009-09-10 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-1906: -- Attachment: LUCENE-1906.patch Problem with CharStream and Tokenizers with custom

[jira] Commented: (LUCENE-1906) Problem with CharStream and Tokenizers with custom reset(Reader) method

2009-09-10 Thread Yonik Seeley (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12753666#action_12753666 ] Yonik Seeley commented on LUCENE-1906: -- +1, this looks like the best fix. Problem

[jira] Commented: (LUCENE-1906) Problem with CharStream and Tokenizers with custom reset(Reader) method

2009-09-10 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12753679#action_12753679 ] Michael McCandless commented on LUCENE-1906: +1, good catch Uwe! Problem

[jira] Commented: (LUCENE-1906) Problem with CharStream and Tokenizers with custom reset(Reader) method

2009-09-10 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12753687#action_12753687 ] Mark Miller commented on LUCENE-1906: - Ready Uwe? Problem with CharStream and

[jira] Commented: (LUCENE-1906) Problem with CharStream and Tokenizers with custom reset(Reader) method

2009-09-10 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12753689#action_12753689 ] Uwe Schindler commented on LUCENE-1906: --- bq. I will now check, if the change of the

[jira] Updated: (LUCENE-1906) Problem with CharStream and Tokenizers with custom reset(Reader) method

2009-09-10 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-1906: -- Attachment: backwards-break.patch Here is the patch for backwards-branch, that fails. It

[jira] Updated: (LUCENE-1906) Problem with CharStream and Tokenizers with custom reset(Reader) method

2009-09-10 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-1906: -- Attachment: (was: backwards-break.patch) Problem with CharStream and Tokenizers with

[jira] Updated: (LUCENE-1906) Problem with CharStream and Tokenizers with custom reset(Reader) method

2009-09-10 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-1906: -- Attachment: backwards-break.patch Sorry, wrong patch, this one is correct. Other one was a

[jira] Commented: (LUCENE-1906) Problem with CharStream and Tokenizers with custom reset(Reader) method

2009-09-10 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12753709#action_12753709 ] Uwe Schindler commented on LUCENE-1906: --- One possibility to prevent this break would

[jira] Issue Comment Edited: (LUCENE-1906) Problem with CharStream and Tokenizers with custom reset(Reader) method

2009-09-10 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12753709#action_12753709 ] Uwe Schindler edited comment on LUCENE-1906 at 9/10/09 10:15 AM:

[jira] Commented: (LUCENE-1906) Problem with CharStream and Tokenizers with custom reset(Reader) method

2009-09-10 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12753713#action_12753713 ] Mark Miller commented on LUCENE-1906: - What about using an introspection cache again?

[jira] Commented: (LUCENE-1906) Problem with CharStream and Tokenizers with custom reset(Reader) method

2009-09-10 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12753715#action_12753715 ] Robert Muir commented on LUCENE-1906: - bq. (only some old Tokenizers not calling

[jira] Commented: (LUCENE-1906) Problem with CharStream and Tokenizers with custom reset(Reader) method

2009-09-10 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12753718#action_12753718 ] Uwe Schindler commented on LUCENE-1906: --- bq. What about using an introspection cache

[jira] Commented: (LUCENE-1906) Problem with CharStream and Tokenizers with custom reset(Reader) method

2009-09-10 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12753720#action_12753720 ] Michael McCandless commented on LUCENE-1906: bq. We also have a

[jira] Commented: (LUCENE-1906) Problem with CharStream and Tokenizers with custom reset(Reader) method

2009-09-10 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12753722#action_12753722 ] Robert Muir commented on LUCENE-1906: - bq. Correct, this is always the problem with

[jira] Commented: (LUCENE-1906) Problem with CharStream and Tokenizers with custom reset(Reader) method

2009-09-10 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12753724#action_12753724 ] Michael McCandless commented on LUCENE-1906: bq. Correct, this is always the

[jira] Commented: (LUCENE-1906) Problem with CharStream and Tokenizers with custom reset(Reader) method

2009-09-10 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12753725#action_12753725 ] Mark Miller commented on LUCENE-1906: - bq. A cache for what? I do not understand The

[jira] Updated: (LUCENE-1906) Problem with CharStream and Tokenizers with custom reset(Reader) method

2009-09-10 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-1906: -- Attachment: LUCENE-1906.patch Here the patch for core. Contrib is unchanged. In principle

[jira] Commented: (LUCENE-1906) Problem with CharStream and Tokenizers with custom reset(Reader) method

2009-09-10 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12753733#action_12753733 ] Michael McCandless commented on LUCENE-1906: I'm still nervous about inserting

[jira] Commented: (LUCENE-1906) Problem with CharStream and Tokenizers with custom reset(Reader) method

2009-09-10 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12753735#action_12753735 ] Uwe Schindler commented on LUCENE-1906: --- instanceof is one of the operators directly

[jira] Updated: (LUCENE-1906) Problem with CharStream and Tokenizers with custom reset(Reader) method

2009-09-10 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-1906: Attachment: LUCENE-1906_contrib.patch contrib changes. Problem with CharStream and Tokenizers

[jira] Commented: (LUCENE-1906) Problem with CharStream and Tokenizers with custom reset(Reader) method

2009-09-10 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12753738#action_12753738 ] Mark Miller commented on LUCENE-1906: - bq. I think breaking back compat here is OK?

[jira] Commented: (LUCENE-1906) Problem with CharStream and Tokenizers with custom reset(Reader) method

2009-09-10 Thread Yonik Seeley (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12753742#action_12753742 ] Yonik Seeley commented on LUCENE-1906: -- bq. instanceof is one of the operators

[jira] Commented: (LUCENE-1906) Problem with CharStream and Tokenizers with custom reset(Reader) method

2009-09-10 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12753746#action_12753746 ] Mark Miller commented on LUCENE-1906: - bq. Hmmm, I had missed that 2.9 required a

[jira] Commented: (LUCENE-1906) Problem with CharStream and Tokenizers with custom reset(Reader) method

2009-09-10 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12753750#action_12753750 ] Uwe Schindler commented on LUCENE-1906: --- bq. Yes, it's relatively fast, but it's

[jira] Issue Comment Edited: (LUCENE-1906) Problem with CharStream and Tokenizers with custom reset(Reader) method

2009-09-10 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12753750#action_12753750 ] Uwe Schindler edited comment on LUCENE-1906 at 9/10/09 11:24 AM:

[jira] Commented: (LUCENE-1906) Problem with CharStream and Tokenizers with custom reset(Reader) method

2009-09-10 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12753751#action_12753751 ] Mark Miller commented on LUCENE-1906: - bq. In my opinion, e.g. external language

[jira] Commented: (LUCENE-1906) Problem with CharStream and Tokenizers with custom reset(Reader) method

2009-09-10 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12753762#action_12753762 ] Michael McCandless commented on LUCENE-1906: bq. A recompile is only needed is

[jira] Commented: (LUCENE-1906) Problem with CharStream and Tokenizers with custom reset(Reader) method

2009-09-10 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12753766#action_12753766 ] Uwe Schindler commented on LUCENE-1906: --- bq. Maybe for 3.0 we can declare that this

[jira] Updated: (LUCENE-1906) Problem with CharStream and Tokenizers with custom reset(Reader) method

2009-09-10 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-1906: -- Attachment: LUCENE-1906-bw.patch LUCENE-1906.patch Here the updated patches

[jira] Commented: (LUCENE-1906) Problem with CharStream and Tokenizers with custom reset(Reader) method

2009-09-10 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12753830#action_12753830 ] Robert Muir commented on LUCENE-1906: - uwe, i like your patch. what was that

[jira] Commented: (LUCENE-1906) Problem with CharStream and Tokenizers with custom reset(Reader) method

2009-09-10 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12753834#action_12753834 ] Uwe Schindler commented on LUCENE-1906: --- It was never used and seems to be a relict

Re: Problem with CharStream and Tokenizers with custom reset(Reader) method

2009-09-10 Thread Jason Rutherglen
I've been seeing strange behavior perhaps related to this? Where sometimes a query is parsed and analyzed using Solr analyzers to it's first clause fairly randomly, and other times the same exact query is parsed and analyzed to the full correct query with all clauses. It's so baffling I haven't

[jira] Commented: (LUCENE-1906) Problem with CharStream and Tokenizers with custom reset(Reader) method

2009-09-10 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12753875#action_12753875 ] Mark Miller commented on LUCENE-1906: - I say we go with it - 'instance of' will have

[jira] Commented: (LUCENE-1906) Problem with CharStream and Tokenizers with custom reset(Reader) method

2009-09-10 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12753877#action_12753877 ] Michael McCandless commented on LUCENE-1906: Patch looks good Uwe! Problem

[jira] Commented: (LUCENE-1906) Problem with CharStream and Tokenizers with custom reset(Reader) method

2009-09-10 Thread Koji Sekiguchi (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12753906#action_12753906 ] Koji Sekiguchi commented on LUCENE-1906: +1, patch looks good, thanks Uwe!