: After 1/2 hour of regex hacking... I think I'll stick with a two step : process: split then trim ;)
But regex hacking is FUN!! I'm 99% certain this does waht you want... <tokenizer class="solr.PatternTokenizerFactory" pattern="((\A\s*)|\s*?(\s+-\s+|--|,|\(|\))|\s+)\s*\z?" ..if it doesn't send me an example string that it fails on and tell me what hte desired output is. Incidently, PatternTokenizerFactory seems to have the anoying limitation of assuming there is a token prior to each match -- even if the match explicitly matches on the start of the string (so it creates a 0 width token) ... that seems like a bug right? -Hoss