Re: Are analysers applied to each value in a multi-valued field separately?

Daniel Collins Tue, 16 Jul 2013 09:19:52 -0700

Self-correction, we'd need to set LimitTokenPositionFilterFactor**y to "PI
+ N" to give the results above because of the increment gap between values.



On 16 July 2013 17:16, Daniel Collins <danwcoll...@gmail.com> wrote:

> Thanks Jack.
>
> There seem to be a never ending set of FilterFactories, I keep hearing
> about new ones all the time :)
>
> Ok, I get it, so our existing code is the first N tokens of each value,
> and using LimitTokenPositionFilterFactor**y with the same number would
> give us the first N of the combined set of tokens, that's good to know.
>
>
>
> On 16 July 2013 14:15, Jack Krupansky <j...@basetechnology.com> wrote:
>
>> Yes, each input value is analyzed separately. Solr passes each input
>> value to Lucene and then Lucene analyzes each.
>>
>> You could use LimitTokenPositionFilterFactor**y which uses the absolute
>> token position - each successive analyzed value would have an incremented
>> position, plus the positionIncrementGap (typically 100 for text.)
>>
>> -- Jack Krupansky
>>
>> -----Original Message----- From: Daniel Collins
>> Sent: Tuesday, July 16, 2013 8:46 AM
>> To: solr-user@lucene.apache.org
>> Subject: Are analysers applied to each value in a multi-valued field
>> separately?
>>
>>
>> I'm guessing the answer is yes, but here's the background.
>>
>> We index 2 separate fields, headline and body text for a document, and
>> then
>> we want to identify the "top" of the story which is th headline + N words
>> of the body (we want to weight that in scoring).
>>
>> So do to that:
>>
>> <copyField src="headline" dest="top"/>
>> <copyField src="body" dest="top"/>
>>
>> And the "top" field has a LimitTokenCountFilterFactory appended to it to
>> do
>> the limiting.
>>
>>        <filter class="solr.**LimitTokenCountFilterFactory"
>> maxTokenCount="N"/>
>>
>> I realised that top needs to be multi-valued, which got me thinking: is
>> that N tokens PER VALUE of top or N tokens in total within the top
>> field...
>> The field is indexed but not stored, so its hard to determine exactly
>> which is being done.
>>
>> Logically, I presume each value in the field is independent (and Solr then
>> just matches searches against each one), so that would suggest N is per
>> value?
>>
>> Cheers, Daniel
>>
>
>

Re: Are analysers applied to each value in a multi-valued field separately?

Reply via email to