Explore streaming Viterbi search in Kuromoji --------------------------------------------
Key: LUCENE-3767 URL: https://issues.apache.org/jira/browse/LUCENE-3767 Project: Lucene - Java Issue Type: Improvement Components: modules/analysis Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 3.6, 4.0 I've been playing with the idea of changing the Kuromoji viterbi search to be 2 passes (intersect, backtrace) instead of 4 passes (break into sentences, intersect, score, backtrace)... this is very much a work in progress, so I'm just getting my current state up. It's got tons of nocommits, doesn't properly handle the user dict nor extended modes yet, etc. One thing I'm playing with is to add a double backtrace for the long compound tokens, ie, instead of penalizing these tokens so that shorter tokens are picked, leave the scores unchanged but on backtrace take that penalty and use it as a threshold for a 2nd best segmentation... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org