Yes, each input value is analyzed separately. Solr passes each input value to Lucene and then Lucene analyzes each.

You could use LimitTokenPositionFilterFactory which uses the absolute token position - each successive analyzed value would have an incremented position, plus the positionIncrementGap (typically 100 for text.)

-- Jack Krupansky

-----Original Message----- From: Daniel Collins
Sent: Tuesday, July 16, 2013 8:46 AM
To: solr-user@lucene.apache.org
Subject: Are analysers applied to each value in a multi-valued field separately?

I'm guessing the answer is yes, but here's the background.

We index 2 separate fields, headline and body text for a document, and then
we want to identify the "top" of the story which is th headline + N words
of the body (we want to weight that in scoring).

So do to that:

<copyField src="headline" dest="top"/>
<copyField src="body" dest="top"/>

And the "top" field has a LimitTokenCountFilterFactory appended to it to do
the limiting.

       <filter class="solr.LimitTokenCountFilterFactory"
maxTokenCount="N"/>

I realised that top needs to be multi-valued, which got me thinking: is
that N tokens PER VALUE of top or N tokens in total within the top field...
The field is indexed but not stored, so its hard to determine exactly
which is being done.

Logically, I presume each value in the field is independent (and Solr then
just matches searches against each one), so that would suggest N is per
value?

Cheers, Daniel

Reply via email to