[jira] Commented: (SOLR-211) regex split() Tokenizer

Ryan McKinley (JIRA) Thu, 26 Apr 2007 15:23:37 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12492132
 ]


Ryan McKinley commented on SOLR-211:
------------------------------------

> 
> I don't know if your new PatternTokenizerFactory could replace either of 
> these, though. For the first case, I still want the white space tokenization 
> after I've stripped off all the junk I don't want. And for the second, I need 
> to be able to do the remapping.
> 

If your really good with regular expressions, perhaps it could all be 
combined... I'm not ;)  

In my real use case, I use the general PatternTokenizerFactory to split the 
input into a bunch of tokens, then I have a custom (ugly!) TokenFilter 
transform the stream with other one-off transformations similar to what you 
describe.  



> regex split() Tokenizer
> -----------------------
>
>                 Key: SOLR-211
>                 URL: https://issues.apache.org/jira/browse/SOLR-211
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>            Reporter: Ryan McKinley
>         Assigned To: Ryan McKinley
>         Attachments: SOLR-211-RegexSplitTokenizer.patch, 
> SOLR-211-RegexSplitTokenizer.patch, SOLR-211-RegexSplitTokenizer.patch
>
>
> A TokenizerFactory that makes tokens from:
>   string.split( regex );

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-211) regex split() Tokenizer

Reply via email to