On 11-May-07, at 5:02 PM, Ryan McKinley wrote:

Chris Hostetter wrote:
: My real use case is adding the the trim filter to the pattern tokenizer.
: the 'correct' answer in my case it to update the offsets.
hmmm... wouldn't the "correct" thing to do in that case be to change your pattern so it strips the whitespace when tokenizing? that way the offsets
of your tokens will be accurate from the begining.

probably....  I'm just not very good at regex ;)

  pattern="--|,|\s-\s|\(|\)"

this will split on "--", " - ", "(", and ")". I can't figure out how to build the pattern so it will trim each thing on the way out.

Try:

\s*[(),-]+\s*

Note that this will also split on multiple (('s and ,,'s. If more precision is required, change the + to {1, 2} or special-case the '--'.

-Mike


Reply via email to