Excluding large tokens from indexing

Paul Dlug Thu, 29 Jul 2010 06:45:04 -0700

Is there a filter available that will remove large tokens from the
token stream? Ideally something configurable to a character limit? I
have a noisy data set that has some large tokens (in this case more
than 50 characters) that I'd like to just strip. They're unlikely to
ever match a user query and will just take up space since there are a
large number of them that are not distinct.



--Paul

Excluding large tokens from indexing

Reply via email to