[ https://issues.apache.org/jira/browse/SOLR-293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mike Klaas updated SOLR-293: ---------------------------- Fix Version/s: (was: 1.3) > Add "minPartLength" to WordDelimiterFilter > ------------------------------------------ > > Key: SOLR-293 > URL: https://issues.apache.org/jira/browse/SOLR-293 > Project: Solr > Issue Type: New Feature > Components: update > Affects Versions: 1.3 > Reporter: Mike Klaas > Assignee: Mike Klaas > Priority: Minor > > WDF is handy but over-tokenizes when faced with short word parts: > A9 > R2D2 > mp3 > This creates one- or two- character tokens which are extremely slow to query > as the doc freq is so high (this is contributing to a significant portion of > our slowest queries). > This patch adds a "minPartLength" option that disables generation of parts > below a certain length. It is recommended to use it with catenateAll, so as > to not lose tokens. > I'll add factory options and tests if we decide to include this (and are > happy with the parameter name). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.