[
https://issues.apache.org/jira/browse/SOLR-293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mike Klaas updated SOLR-293:
----------------------------
Fix Version/s: (was: 1.3)
> Add "minPartLength" to WordDelimiterFilter
> ------------------------------------------
>
> Key: SOLR-293
> URL: https://issues.apache.org/jira/browse/SOLR-293
> Project: Solr
> Issue Type: New Feature
> Components: update
> Affects Versions: 1.3
> Reporter: Mike Klaas
> Assignee: Mike Klaas
> Priority: Minor
>
> WDF is handy but over-tokenizes when faced with short word parts:
> A9
> R2D2
> mp3
> This creates one- or two- character tokens which are extremely slow to query
> as the doc freq is so high (this is contributing to a significant portion of
> our slowest queries).
> This patch adds a "minPartLength" option that disables generation of parts
> below a certain length. It is recommended to use it with catenateAll, so as
> to not lose tokens.
> I'll add factory options and tests if we decide to include this (and are
> happy with the parameter name).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.