[ https://issues.apache.org/jira/browse/SOLR-211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12492132 ]
Ryan McKinley commented on SOLR-211: ------------------------------------ > > I don't know if your new PatternTokenizerFactory could replace either of > these, though. For the first case, I still want the white space tokenization > after I've stripped off all the junk I don't want. And for the second, I need > to be able to do the remapping. > If your really good with regular expressions, perhaps it could all be combined... I'm not ;) In my real use case, I use the general PatternTokenizerFactory to split the input into a bunch of tokens, then I have a custom (ugly!) TokenFilter transform the stream with other one-off transformations similar to what you describe. > regex split() Tokenizer > ----------------------- > > Key: SOLR-211 > URL: https://issues.apache.org/jira/browse/SOLR-211 > Project: Solr > Issue Type: New Feature > Components: search > Reporter: Ryan McKinley > Assigned To: Ryan McKinley > Attachments: SOLR-211-RegexSplitTokenizer.patch, > SOLR-211-RegexSplitTokenizer.patch, SOLR-211-RegexSplitTokenizer.patch > > > A TokenizerFactory that makes tokens from: > string.split( regex ); -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.