I don't see a Jira for it, but I do see the bad behavior in both Solr 3.6 and 4.0-BETA in Solr admin analysis.

Interestingly, the screen shot for LUCENE-3642 does in fact show the (improperly) incremented positions for successive ngrams.

See:
https://issues.apache.org/jira/browse/LUCENE-3642

I'm surprised that nobody noticed the bogus positions back then.

Technically, this is a Lucene issue.

-- Jack Krupansky

-----Original Message----- From: Walter Underwood
Sent: Wednesday, September 05, 2012 1:51 PM
To: solr-user@lucene.apache.org
Subject: EdgeNgramTokenFilter and positions

In the analysis page, the n-grams produced by EdgeNgramTokenFilter are at sequential positions. This seems wrong, because an n-gram is associated with a source token at a specific position. It also really messes up phrase matches.

With the source text "fleen", these positions and tokens are generated:

1,fl
2,fle
3,flee
4,fleen

Is this a known bug? Fixed? I'm running 3.3.

wunder
--
Walter Underwood
Search Guy
wun...@chegg.com<mailto:wun...@chegg.com>



Reply via email to