[jira] Updated: (SOLR-1423) Lucene 2.9 RC4 may need some changes in Solr Analyzers using CharStream & others

Koji Sekiguchi (JIRA) Mon, 14 Sep 2009 16:33:27 -0700

     [ 
https://issues.apache.org/jira/browse/SOLR-1423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Koji Sekiguchi updated SOLR-1423:
---------------------------------

    Attachment: SOLR-1423.patch

The patch that is Uwe's one with replacing split()/group() methods.

bq. Why does the PatternTokenizer does not have the methods newToken and so on 
in its own class
Yeah, I'd realized it immediately after posting the patch, but I was going to 
be out.

And thank you for adapting it for new TokenStream API.

bq. I searched for setOffset() in Solr source code and found one additional 
occurence of it without offset correcting in FieldType.java. This patch fixes 
this.
Good catch, Uwe! I slipped over it.

I think the empty tokens is a bug and should be omitted in this patch.

bq. A second thing: Lucene has a new BaseTokenStreamTest class for checking 
tokens without Token instances (which would no loger work, when Lucene 3.0 
switches to Attributes only). Maybe you should update these test and use 
assertAnalyzesTo from the new base class instead.
Very nice! Can you open a separate ticket?

> Lucene 2.9 RC4 may need some changes in Solr Analyzers using CharStream & 
> others
> --------------------------------------------------------------------------------
>
>                 Key: SOLR-1423
>                 URL: https://issues.apache.org/jira/browse/SOLR-1423
>             Project: Solr
>          Issue Type: Task
>          Components: Analysis
>    Affects Versions: 1.4
>            Reporter: Uwe Schindler
>            Assignee: Koji Sekiguchi
>             Fix For: 1.4
>
>         Attachments: SOLR-1423-FieldType.patch, SOLR-1423.patch, 
> SOLR-1423.patch, SOLR-1423.patch
>
>
> Because of some backwards compatibility problems (LUCENE-1906) we changed the 
> CharStream/CharFilter API a little bit. Tokenizer now only has a input field 
> of type java.io.Reader (as before the CharStream code). To correct offsets, 
> it is now needed to call the Tokenizer.correctOffset(int) method, which 
> delegates to the CharStream (if input is subclass of CharStream), else 
> returns an uncorrected offset. Normally it is enough to change all occurences 
> of input.correctOffset() to this.correctOffset() in Tokenizers. It should 
> also be checked, if custom Tokenizers in Solr do correct their offsets.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-1423) Lucene 2.9 RC4 may need some changes in Solr Analyzers using CharStream & others

Reply via email to