I don't see a Jira for it, but I do see the bad behavior in both Solr 3.6
and 4.0-BETA in Solr admin analysis.
Interestingly, the screen shot for LUCENE-3642 does in fact show the
(improperly) incremented positions for successive ngrams.
See:
https://issues.apache.org/jira/browse/LUCENE-3642
I'm surprised that nobody noticed the bogus positions back then.
Technically, this is a Lucene issue.
-- Jack Krupansky
-----Original Message-----
From: Walter Underwood
Sent: Wednesday, September 05, 2012 1:51 PM
To: solr-user@lucene.apache.org
Subject: EdgeNgramTokenFilter and positions
In the analysis page, the n-grams produced by EdgeNgramTokenFilter are at
sequential positions. This seems wrong, because an n-gram is associated with
a source token at a specific position. It also really messes up phrase
matches.
With the source text "fleen", these positions and tokens are generated:
1,fl
2,fle
3,flee
4,fleen
Is this a known bug? Fixed? I'm running 3.3.
wunder
--
Walter Underwood
Search Guy
wun...@chegg.com<mailto:wun...@chegg.com>