Re: EdgeNgramTokenFilter and positions

Jack Krupansky Wed, 05 Sep 2012 13:47:50 -0700

I don't see a Jira for it, but I do see the bad behavior in both Solr 3.6and 4.0-BETA in Solr admin analysis.

Interestingly, the screen shot for LUCENE-3642 does in fact show the(improperly) incremented positions for successive ngrams.


See:
https://issues.apache.org/jira/browse/LUCENE-3642

I'm surprised that nobody noticed the bogus positions back then.

Technically, this is a Lucene issue.

-- Jack Krupansky

-----Original Message-----From: Walter Underwood

Sent: Wednesday, September 05, 2012 1:51 PM
To: solr-user@lucene.apache.org
Subject: EdgeNgramTokenFilter and positions

In the analysis page, the n-grams produced by EdgeNgramTokenFilter are atsequential positions. This seems wrong, because an n-gram is associated witha source token at a specific position. It also really messes up phrasematches.


With the source text "fleen", these positions and tokens are generated:

1,fl
2,fle
3,flee
4,fleen

Is this a known bug? Fixed? I'm running 3.3.

wunder
--
Walter Underwood
Search Guy
wun...@chegg.com<mailto:wun...@chegg.com>

Re: EdgeNgramTokenFilter and positions

Reply via email to