[ https://issues.apache.org/jira/browse/SOLR-89?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hoss Man resolved SOLR-89. -------------------------- Resolution: Fixed patch commited with a a few small javadoc tweaks and a bit of whitesapce added to one of hte example docs to illustrate PatternReplaceFilter's effects. > new TokenFilters for whitespace trimming and pattern replacing > -------------------------------------------------------------- > > Key: SOLR-89 > URL: https://issues.apache.org/jira/browse/SOLR-89 > Project: Solr > Issue Type: New Feature > Reporter: Hoss Man > Assigned To: Hoss Man > Attachments: pattern-and-trim-filters.patch > > > (note: lumping these in a single issue since i did them both at the same time) > More then one person has asekd me recently about how they can configure > strings which: > a) sort case insensitively > B) ignore leading (and trailing although it's not as big of an issue) > whitespace > c ) ignore certain characters anywhere in the string (ie: strip > punctuation) > The first can be solved already using the KeywordTokenizer in conjunction > with the LowerCaseFilter. I've written a TrimFilter and PatternReplaceFilter > to address the later two. (Strictly speaking, TrimFilter isn't needed since > you cna make a pattern thta matches leading or trailing whitespace, but for > people who are only interested in the whitespace issue, i'm sure > String.trim() is more efficient the a regex) > An example of how they can be used... > <!-- This is an example of using the KeywordTokenizer along > With various TokenFilterFactories to produce a sortable field > that does not include some properties of the source text > --> > <fieldtype name="alphaOnlySort" class="solr.TextField" > sortMissingLast="true" omitNorms="true"> > <analyzer> > <!-- KeywordTokenizer does no actual tokenizing, so the entire > input string is preserved as a single token > --> > <tokenizer class="solr.KeywordTokenizerFactory"/> > <!-- The LowerCase TokenFilter does what you expect, which can be > when you want your sorting to be case insensitive > --> > <filter class="solr.LowerCaseFilterFactory" /> > <!-- The TrimFilter removes any leading or trailing whitespace --> > <filter class="solr.TrimFilterFactory" /> > <!-- The PatternReplaceFilter gives you the flexibility to use > Java Regular expression to replace any sequence of characters > matching a pattern with an arbitrary replacement string, > which may include back refrences to portions of the orriginal > string matched by the pattern. > > See the Java Regular Expression documentation for more > infomation on pattern and replacement string syntax. > > > http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/package-summary.html > --> > <filter class="solr.PatternReplaceFilterFactory" > pattern="([^a-z])" replacement="" replace="all" > /> > </analyzer> > </fieldtype> -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira