[
https://issues.apache.org/jira/browse/SOLR-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12549879
]
Koji Sekiguchi commented on SOLR-415:
-------------------------------------
This is for debug. One of use cases in my case for example...
We use morphological tokenizer to tokenize Japanese text. To let the tokenizer
analyze text, we have to have "character level normalization" prior to
tokenization.
I'll try to explain it by using English words...
If you have a text to be analyzed that includes "colour". And your
morphological tokenizer uses American dictionary to tokenize the text, you have
to normalize "colour" to "color" so that the tokenizer can look up it in the
dictionary.
To implement this, I've developed MappingReader that reads mapping.txt and
normalize (Japanese) characters prior to tokenizer:
MappingReader -> Japanese Tokenizer -> Filters...
In this case, if MappingReader normalizes "ou" to "o", this makes a trouble in
highlighter. (I used LoggingFilter to find this problem.)
To solve this problem, MappingReader has correctPosition(int pos) method to
tell original position to tokenizer.
(If this is useful for European languages (for umlaut or something...) I'm glad
to open another JIRA issue.)
Also in SOLR-319, I used LoggingFilter to see SynonymFilter outputs.
I'll try to include your suggestion into my patch soon.
Thank you.
> LoggingFilter for debug
> -----------------------
>
> Key: SOLR-415
> URL: https://issues.apache.org/jira/browse/SOLR-415
> Project: Solr
> Issue Type: Improvement
> Reporter: Koji Sekiguchi
> Priority: Trivial
> Attachments: SOLR-415.patch, SOLR-415.patch, SOLR-415.patch
>
>
> logging version of analysis.jsp
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.