Hi Isaac,

Did you consider omitting norms completely for that field? omitNorms="true"
Are you using solr.RemoveDuplicatesTokenFilterFactory?



On Thursday, December 5, 2013 8:55 PM, Isaac Hebsh <isaac.he...@gmail.com> 
wrote:
 
Hi,
we implemented a morphologic analyzer, which stems words on index time.
For some reasons, we index both the original word and the stem (on the same
position, of course).
The stemming is done on a specific language, so other languages are not
stemmed at all.

Because of that, two documents with the same amount of terms, may have
different termVector size. document which contains many words that being
stemmed, will have a double sized termVector. This behaviour affects the
relevance score in a BAD way. the fieldNorm of these documents reduces
thier score. This is NOT the wanted behaviour in our case.

We are looking for a way to "mark" the stemmed words (on index time, of
course) so they won't affect the fieldNorm. Do such a way exist?

Do you have another idea?

Reply via email to