Your analyzer needs to set positionIncrement correctly: sounds like its broken.
On Thu, Dec 5, 2013 at 1:53 PM, Isaac Hebsh <isaac.he...@gmail.com> wrote: > Hi, > we implemented a morphologic analyzer, which stems words on index time. > For some reasons, we index both the original word and the stem (on the same > position, of course). > The stemming is done on a specific language, so other languages are not > stemmed at all. > > Because of that, two documents with the same amount of terms, may have > different termVector size. document which contains many words that being > stemmed, will have a double sized termVector. This behaviour affects the > relevance score in a BAD way. the fieldNorm of these documents reduces > thier score. This is NOT the wanted behaviour in our case. > > We are looking for a way to "mark" the stemmed words (on index time, of > course) so they won't affect the fieldNorm. Do such a way exist? > > Do you have another idea?