Re: [jira] Commented: (SOLR-234) TrimFilter should update the start and end offsets

Mike Klaas Fri, 11 May 2007 17:34:33 -0700


On 11-May-07, at 5:02 PM, Ryan McKinley wrote:

Chris Hostetter wrote:
: My real use case is adding the the trim filter to the patterntokenizer.
: the 'correct' answer in my case it to update the offsets.
hmmm... wouldn't the "correct" thing to do in that case be tochange yourpattern so it strips the whitespace when tokenizing? that way theoffsets
of your tokens will be accurate from the begining.
probably....  I'm just not very good at regex ;)

  pattern="--|,|\s-\s|\(|\)"
this will split on "--", " - ", "(", and ")". I can't figure outhow to build the pattern so it will trim each thing on the way out.


Try:

\s*[(),-]+\s*

Note that this will also split on multiple (('s and ,,'s. If moreprecision is required, change the + to {1, 2} or special-case the '--'.


-Mike

Re: [jira] Commented: (SOLR-234) TrimFilter should update the start and end offsets

Reply via email to