Thanks Jack. There seem to be a never ending set of FilterFactories, I keep hearing about new ones all the time :)
Ok, I get it, so our existing code is the first N tokens of each value, and using LimitTokenPositionFilterFactor**y with the same number would give us the first N of the combined set of tokens, that's good to know. On 16 July 2013 14:15, Jack Krupansky <j...@basetechnology.com> wrote: > Yes, each input value is analyzed separately. Solr passes each input value > to Lucene and then Lucene analyzes each. > > You could use LimitTokenPositionFilterFactor**y which uses the absolute > token position - each successive analyzed value would have an incremented > position, plus the positionIncrementGap (typically 100 for text.) > > -- Jack Krupansky > > -----Original Message----- From: Daniel Collins > Sent: Tuesday, July 16, 2013 8:46 AM > To: solr-user@lucene.apache.org > Subject: Are analysers applied to each value in a multi-valued field > separately? > > > I'm guessing the answer is yes, but here's the background. > > We index 2 separate fields, headline and body text for a document, and then > we want to identify the "top" of the story which is th headline + N words > of the body (we want to weight that in scoring). > > So do to that: > > <copyField src="headline" dest="top"/> > <copyField src="body" dest="top"/> > > And the "top" field has a LimitTokenCountFilterFactory appended to it to do > the limiting. > > <filter class="solr.**LimitTokenCountFilterFactory" > maxTokenCount="N"/> > > I realised that top needs to be multi-valued, which got me thinking: is > that N tokens PER VALUE of top or N tokens in total within the top field... > The field is indexed but not stored, so its hard to determine exactly > which is being done. > > Logically, I presume each value in the field is independent (and Solr then > just matches searches against each one), so that would suggest N is per > value? > > Cheers, Daniel >