subject:"Aligning text analyses, with and without stopwords"

Re: Aligning text analyses, with and without stopwords

2012-07-26 Thread Robert Muir

On Thu, Jul 26, 2012 at 12:16 PM, Johannes Neubarth wrote: > For stopwords that are at the end of the tokenStream (e.g. "them"), the > positionIncrement is not updated - after leaving the while-loop, > skippedTokens is 0. My workaround is to append a unique number to every > input text, so that e

Aligning text analyses, with and without stopwords

2012-07-26 Thread Johannes Neubarth

Hello, I want to align the output of two different analysis pipelines, but I don't know how. We are using Lucene for text analysis. First, every input text is normalized using StandardTokenizer, StandardFilter and LowerCaseFilter. This yields a list of tokens (list1). Second, the same input text is