I do not know, how this could affect Solr, but it could be the case.
Currently most Tokenizers do not use CharStreams at all. After committing
LUCENE-1906, I think there is also some additional work in Solr's custom
Tokenizers needed (changed the correctOffset method).
-
Uwe Schindler
When reviewing the new CharStream code added to Tokenizers, I found a
serious problem with backwards compatibility and other Tokenizers, that do
not override reset(CharStream).
The problem is, that e.g. CharTokenizer only overrides reset(Reader):
public void reset(Reader input) throws
I tested the attached patch, all tests still compile and work as exspected
(as CharStream extends Reader).
I think I should open an issue?
Uwe
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
-Original Message-
From: Uwe
Yeah, lets open an issue and mark it blocker - I'll hold RC4 for it (was
just about to push it when I caught this email).
Uwe Schindler wrote:
I tested the attached patch, all tests still compile and work as exspected
(as CharStream extends Reader).
I think I should open an issue?
Uwe
Problem with CharStream and Tokenizers with custom reset(Reader) method
---
Key: LUCENE-1906
URL: https://issues.apache.org/jira/browse/LUCENE-1906
Project: Lucene - Java
[
https://issues.apache.org/jira/browse/LUCENE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Uwe Schindler updated LUCENE-1906:
--
Description:
When reviewing the new CharStream code added to Tokenizers, I found a
serious
[
https://issues.apache.org/jira/browse/LUCENE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12753664#action_12753664
]
Uwe Schindler commented on LUCENE-1906:
---
I will now check, if the change of the
[
https://issues.apache.org/jira/browse/LUCENE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Uwe Schindler updated LUCENE-1906:
--
Attachment: LUCENE-1906.patch
Problem with CharStream and Tokenizers with custom
[
https://issues.apache.org/jira/browse/LUCENE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12753666#action_12753666
]
Yonik Seeley commented on LUCENE-1906:
--
+1, this looks like the best fix.
Problem
[
https://issues.apache.org/jira/browse/LUCENE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12753679#action_12753679
]
Michael McCandless commented on LUCENE-1906:
+1, good catch Uwe!
Problem
[
https://issues.apache.org/jira/browse/LUCENE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12753687#action_12753687
]
Mark Miller commented on LUCENE-1906:
-
Ready Uwe?
Problem with CharStream and
[
https://issues.apache.org/jira/browse/LUCENE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12753689#action_12753689
]
Uwe Schindler commented on LUCENE-1906:
---
bq. I will now check, if the change of the
[
https://issues.apache.org/jira/browse/LUCENE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Uwe Schindler updated LUCENE-1906:
--
Attachment: backwards-break.patch
Here is the patch for backwards-branch, that fails. It
[
https://issues.apache.org/jira/browse/LUCENE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Uwe Schindler updated LUCENE-1906:
--
Attachment: (was: backwards-break.patch)
Problem with CharStream and Tokenizers with
[
https://issues.apache.org/jira/browse/LUCENE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Uwe Schindler updated LUCENE-1906:
--
Attachment: backwards-break.patch
Sorry, wrong patch, this one is correct. Other one was a
[
https://issues.apache.org/jira/browse/LUCENE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12753709#action_12753709
]
Uwe Schindler commented on LUCENE-1906:
---
One possibility to prevent this break would
[
https://issues.apache.org/jira/browse/LUCENE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12753709#action_12753709
]
Uwe Schindler edited comment on LUCENE-1906 at 9/10/09 10:15 AM:
[
https://issues.apache.org/jira/browse/LUCENE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12753713#action_12753713
]
Mark Miller commented on LUCENE-1906:
-
What about using an introspection cache again?
[
https://issues.apache.org/jira/browse/LUCENE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12753715#action_12753715
]
Robert Muir commented on LUCENE-1906:
-
bq. (only some old Tokenizers not calling
[
https://issues.apache.org/jira/browse/LUCENE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12753718#action_12753718
]
Uwe Schindler commented on LUCENE-1906:
---
bq. What about using an introspection cache
[
https://issues.apache.org/jira/browse/LUCENE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12753720#action_12753720
]
Michael McCandless commented on LUCENE-1906:
bq. We also have a
[
https://issues.apache.org/jira/browse/LUCENE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12753722#action_12753722
]
Robert Muir commented on LUCENE-1906:
-
bq. Correct, this is always the problem with
[
https://issues.apache.org/jira/browse/LUCENE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12753724#action_12753724
]
Michael McCandless commented on LUCENE-1906:
bq. Correct, this is always the
[
https://issues.apache.org/jira/browse/LUCENE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12753725#action_12753725
]
Mark Miller commented on LUCENE-1906:
-
bq. A cache for what? I do not understand The
[
https://issues.apache.org/jira/browse/LUCENE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Uwe Schindler updated LUCENE-1906:
--
Attachment: LUCENE-1906.patch
Here the patch for core. Contrib is unchanged.
In principle
[
https://issues.apache.org/jira/browse/LUCENE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12753733#action_12753733
]
Michael McCandless commented on LUCENE-1906:
I'm still nervous about inserting
[
https://issues.apache.org/jira/browse/LUCENE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12753735#action_12753735
]
Uwe Schindler commented on LUCENE-1906:
---
instanceof is one of the operators directly
[
https://issues.apache.org/jira/browse/LUCENE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Muir updated LUCENE-1906:
Attachment: LUCENE-1906_contrib.patch
contrib changes.
Problem with CharStream and Tokenizers
[
https://issues.apache.org/jira/browse/LUCENE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12753738#action_12753738
]
Mark Miller commented on LUCENE-1906:
-
bq. I think breaking back compat here is OK?
[
https://issues.apache.org/jira/browse/LUCENE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12753742#action_12753742
]
Yonik Seeley commented on LUCENE-1906:
--
bq. instanceof is one of the operators
[
https://issues.apache.org/jira/browse/LUCENE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12753746#action_12753746
]
Mark Miller commented on LUCENE-1906:
-
bq. Hmmm, I had missed that 2.9 required a
[
https://issues.apache.org/jira/browse/LUCENE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12753750#action_12753750
]
Uwe Schindler commented on LUCENE-1906:
---
bq. Yes, it's relatively fast, but it's
[
https://issues.apache.org/jira/browse/LUCENE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12753750#action_12753750
]
Uwe Schindler edited comment on LUCENE-1906 at 9/10/09 11:24 AM:
[
https://issues.apache.org/jira/browse/LUCENE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12753751#action_12753751
]
Mark Miller commented on LUCENE-1906:
-
bq. In my opinion, e.g. external language
[
https://issues.apache.org/jira/browse/LUCENE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12753762#action_12753762
]
Michael McCandless commented on LUCENE-1906:
bq. A recompile is only needed is
[
https://issues.apache.org/jira/browse/LUCENE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12753766#action_12753766
]
Uwe Schindler commented on LUCENE-1906:
---
bq. Maybe for 3.0 we can declare that this
[
https://issues.apache.org/jira/browse/LUCENE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Uwe Schindler updated LUCENE-1906:
--
Attachment: LUCENE-1906-bw.patch
LUCENE-1906.patch
Here the updated patches
[
https://issues.apache.org/jira/browse/LUCENE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12753830#action_12753830
]
Robert Muir commented on LUCENE-1906:
-
uwe, i like your patch.
what was that
[
https://issues.apache.org/jira/browse/LUCENE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12753834#action_12753834
]
Uwe Schindler commented on LUCENE-1906:
---
It was never used and seems to be a relict
I've been seeing strange behavior perhaps related to this? Where
sometimes a query is parsed and analyzed using Solr analyzers to
it's first clause fairly randomly, and other times the same
exact query is parsed and analyzed to the full correct query with all
clauses. It's so baffling I haven't
[
https://issues.apache.org/jira/browse/LUCENE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12753875#action_12753875
]
Mark Miller commented on LUCENE-1906:
-
I say we go with it - 'instance of' will have
[
https://issues.apache.org/jira/browse/LUCENE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12753877#action_12753877
]
Michael McCandless commented on LUCENE-1906:
Patch looks good Uwe!
Problem
[
https://issues.apache.org/jira/browse/LUCENE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12753906#action_12753906
]
Koji Sekiguchi commented on LUCENE-1906:
+1, patch looks good, thanks Uwe!
43 matches
Mail list logo