[jira] [Commented] (LUCENE-9390) Kuromoji tokenizer discards tokens if they start with a punctuation character

2020-06-19 Thread Jun Ohtani (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-9390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17140338#comment-17140338 ] Jun Ohtani commented on LUCENE-9390: Hi [~h.kazuaki] ,  Thanks for your comment! I

[jira] [Commented] (LUCENE-9390) Kuromoji tokenizer discards tokens if they start with a punctuation character

2020-06-17 Thread Kazuaki Hiraga (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-9390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17138948#comment-17138948 ] Kazuaki Hiraga commented on LUCENE-9390: Hello, I might have just remembered wh

[jira] [Commented] (LUCENE-9390) Kuromoji tokenizer discards tokens if they start with a punctuation character

2020-06-13 Thread Jun Ohtani (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-9390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17134827#comment-17134827 ] Jun Ohtani commented on LUCENE-9390: I've made a pull request.  https://github.com/

[jira] [Commented] (LUCENE-9390) Kuromoji tokenizer discards tokens if they start with a punctuation character

2020-06-12 Thread Jun Ohtani (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-9390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17134428#comment-17134428 ] Jun Ohtani commented on LUCENE-9390: I also checked *UniDic* around punctuation char

[jira] [Commented] (LUCENE-9390) Kuromoji tokenizer discards tokens if they start with a punctuation character

2020-06-12 Thread Jun Ohtani (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-9390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17134249#comment-17134249 ] Jun Ohtani commented on LUCENE-9390: I counted 3 types of words in ipadic csv files.

[jira] [Commented] (LUCENE-9390) Kuromoji tokenizer discards tokens if they start with a punctuation character

2020-06-07 Thread Jun Ohtani (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-9390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17127839#comment-17127839 ] Jun Ohtani commented on LUCENE-9390: Not exactly related, but we discussed around di

[jira] [Commented] (LUCENE-9390) Kuromoji tokenizer discards tokens if they start with a punctuation character

2020-06-04 Thread Michael McCandless (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-9390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17126237#comment-17126237 ] Michael McCandless commented on LUCENE-9390: Not sure if it is related, but

[jira] [Commented] (LUCENE-9390) Kuromoji tokenizer discards tokens if they start with a punctuation character

2020-06-04 Thread Jim Ferenczi (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-9390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17126141#comment-17126141 ] Jim Ferenczi commented on LUCENE-9390: -- > I usually set the "discardPunctuation" fl

[jira] [Commented] (LUCENE-9390) Kuromoji tokenizer discards tokens if they start with a punctuation character

2020-06-03 Thread Jun Ohtani (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-9390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17125547#comment-17125547 ] Jun Ohtani commented on LUCENE-9390: IMO, we remove the flag and the kuromoji output

[jira] [Commented] (LUCENE-9390) Kuromoji tokenizer discards tokens if they start with a punctuation character

2020-06-03 Thread Tomoko Uchida (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-9390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17125542#comment-17125542 ] Tomoko Uchida commented on LUCENE-9390: --- Personally, I usually set the "discardPun