Yes, each input value is analyzed separately. Solr passes each input value
to Lucene and then Lucene analyzes each.
You could use LimitTokenPositionFilterFactory which uses the absolute token
position - each successive analyzed value would have an incremented
position, plus the positionIncrementGap (typically 100 for text.)
-- Jack Krupansky
-----Original Message-----
From: Daniel Collins
Sent: Tuesday, July 16, 2013 8:46 AM
To: solr-user@lucene.apache.org
Subject: Are analysers applied to each value in a multi-valued field
separately?
I'm guessing the answer is yes, but here's the background.
We index 2 separate fields, headline and body text for a document, and then
we want to identify the "top" of the story which is th headline + N words
of the body (we want to weight that in scoring).
So do to that:
<copyField src="headline" dest="top"/>
<copyField src="body" dest="top"/>
And the "top" field has a LimitTokenCountFilterFactory appended to it to do
the limiting.
<filter class="solr.LimitTokenCountFilterFactory"
maxTokenCount="N"/>
I realised that top needs to be multi-valued, which got me thinking: is
that N tokens PER VALUE of top or N tokens in total within the top field...
The field is indexed but not stored, so its hard to determine exactly
which is being done.
Logically, I presume each value in the field is independent (and Solr then
just matches searches against each one), so that would suggest N is per
value?
Cheers, Daniel