Self-correction, we'd need to set LimitTokenPositionFilterFactor**y to "PI + N" to give the results above because of the increment gap between values.
On 16 July 2013 17:16, Daniel Collins <danwcoll...@gmail.com> wrote: > Thanks Jack. > > There seem to be a never ending set of FilterFactories, I keep hearing > about new ones all the time :) > > Ok, I get it, so our existing code is the first N tokens of each value, > and using LimitTokenPositionFilterFactor**y with the same number would > give us the first N of the combined set of tokens, that's good to know. > > > > On 16 July 2013 14:15, Jack Krupansky <j...@basetechnology.com> wrote: > >> Yes, each input value is analyzed separately. Solr passes each input >> value to Lucene and then Lucene analyzes each. >> >> You could use LimitTokenPositionFilterFactor**y which uses the absolute >> token position - each successive analyzed value would have an incremented >> position, plus the positionIncrementGap (typically 100 for text.) >> >> -- Jack Krupansky >> >> -----Original Message----- From: Daniel Collins >> Sent: Tuesday, July 16, 2013 8:46 AM >> To: solr-user@lucene.apache.org >> Subject: Are analysers applied to each value in a multi-valued field >> separately? >> >> >> I'm guessing the answer is yes, but here's the background. >> >> We index 2 separate fields, headline and body text for a document, and >> then >> we want to identify the "top" of the story which is th headline + N words >> of the body (we want to weight that in scoring). >> >> So do to that: >> >> <copyField src="headline" dest="top"/> >> <copyField src="body" dest="top"/> >> >> And the "top" field has a LimitTokenCountFilterFactory appended to it to >> do >> the limiting. >> >> <filter class="solr.**LimitTokenCountFilterFactory" >> maxTokenCount="N"/> >> >> I realised that top needs to be multi-valued, which got me thinking: is >> that N tokens PER VALUE of top or N tokens in total within the top >> field... >> The field is indexed but not stored, so its hard to determine exactly >> which is being done. >> >> Logically, I presume each value in the field is independent (and Solr then >> just matches searches against each one), so that would suggest N is per >> value? >> >> Cheers, Daniel >> > >