Itamar Syn-Hershko created LUCENE-6103:
------------------------------------------

             Summary: StandardTokenizer doesn't tokenizer word:word
                 Key: LUCENE-6103
                 URL: https://issues.apache.org/jira/browse/LUCENE-6103
             Project: Lucene - Core
          Issue Type: Bug
          Components: modules/analysis
    Affects Versions: 4.9
            Reporter: Itamar Syn-Hershko


StandardTokenizer (and by result most default analyzers) will not tokenize 
word:word and will preserve it as one token. This can be easily seen using 
Elasticsearch's analyze API:

localhost:9200/_analyze?tokenizer=standard&text=word%20word:word

If this is the intended behavior, then why? I can't really see the logic behind 
it.

If not, I'll be happy to join in the effort of fixing this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to