Is there a filter available that will remove large tokens from the
token stream? Ideally something configurable to a character limit? I
have a noisy data set that has some large tokens (in this case more
than 50 characters) that I'd like to just strip. They're unlikely to
ever match a user query and will just take up space since there are a
large number of them that are not distinct.


--Paul

Reply via email to