On Thu, Jul 26, 2012 at 12:16 PM, Johannes Neubarth wrote:
> For stopwords that are at the end of the tokenStream (e.g. "them"), the
> positionIncrement is not updated - after leaving the while-loop,
> skippedTokens is 0. My workaround is to append a unique number to every
> input text, so that e
Hello,
I want to align the output of two different analysis pipelines, but I
don't know how.
We are using Lucene for text analysis. First, every input text is
normalized using StandardTokenizer, StandardFilter and LowerCaseFilter.
This yields a list of tokens (list1). Second, the same input text is