[ 
https://issues.apache.org/jira/browse/SOLR-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12549879
 ] 

Koji Sekiguchi commented on SOLR-415:
-------------------------------------

This is for debug. One of use cases in my case for example...

We use morphological tokenizer to tokenize Japanese text. To let the tokenizer 
analyze text, we have to have "character level normalization" prior to 
tokenization.

I'll try to explain it by using English words...

If you have a text to be analyzed that includes "colour". And your 
morphological tokenizer uses American dictionary to tokenize the text, you have 
to normalize "colour" to "color" so that the tokenizer can look up it in the 
dictionary.

To implement this, I've developed MappingReader that reads mapping.txt and 
normalize (Japanese) characters prior to tokenizer:

MappingReader -> Japanese Tokenizer -> Filters...

In this case, if MappingReader normalizes "ou" to "o", this makes a trouble in 
highlighter. (I used LoggingFilter to find this problem.)

To solve this problem, MappingReader has correctPosition(int pos) method to 
tell original position to tokenizer.
(If this is useful for European languages (for umlaut or something...) I'm glad 
to open another JIRA issue.)

Also in SOLR-319, I used LoggingFilter to see SynonymFilter outputs.

I'll try to include your suggestion into my patch soon.

Thank you.

> LoggingFilter for debug
> -----------------------
>
>                 Key: SOLR-415
>                 URL: https://issues.apache.org/jira/browse/SOLR-415
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Koji Sekiguchi
>            Priority: Trivial
>         Attachments: SOLR-415.patch, SOLR-415.patch, SOLR-415.patch
>
>
> logging version of analysis.jsp

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to