Actually, I appear to be wrong on the position limit filter - it appears to
be relative to the string being analyzed and not the full sequence of values
analyzed for the field.
Given this field and type:
<fieldType name="text_limit_position4" class="solr.TextField"
positionIncrementGap="10">
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LimitTokenPositionFilterFactory"
maxTokenPosition="23"/>
</analyzer>
</fieldType>
<field name="text_limit3" type="text_limit_position4"
indexed="true" stored="true" multiValued="true" />
And this document:
curl "http://localhost:8983/solr/update?commit=true" \
-H 'Content-type:application/json' -d '
[{"id": "doc-1",
"title": "Hello World",
"text_limit4": ["a1 a2 a3 a4", "b1 b2 b3 b4", "c1 c2 c3 c4",
"d1 d2 d3 d4", "e1 e2 e3 e4", "f1 f2 f3 f4"]}]'
The hope was that the indexed sequence of terms would stop at c4, but the
full values are indexed. These queries succeed:
curl "http://localhost:8983/solr/select/?q=text_limit4:d1"
curl "http://localhost:8983/solr/select/?q=text_limit4:f4"
And this query fails:
curl "http://localhost:8983/solr/select/?q=text_limit4:%22a4+f1%22~65"
While this query succeeds:
curl "http://localhost:8983/solr/select/?q=text_limit4:%22a4+f1%22~66"
Indicating that the position gaps of 10 are there between each value, but
the token position limit filter doesn't trigger.
This document:
curl "http://localhost:8983/solr/update?commit=true" \
-H 'Content-type:application/json' -d '
[{"id": "doc-1",
"title": "Hello World",
"text_limit4": "a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15 a16 a17
a18 a19 a20 a21 a22 a23 a24 a25 a26"}]'
Fails on this query:
curl "http://localhost:8983/solr/select/?q=text_limit4:a24"
But succeeds on this query:
curl "http://localhost:8983/solr/select/?q=text_limit4:a23"
Indicating that the token position limit filter does work, but only for the
relative position, making it not much more useful than the token count limit
filter.
Oh well.
-- Jack Krupansky
-----Original Message-----
From: Daniel Collins
Sent: Tuesday, July 16, 2013 12:18 PM
To: solr-user@lucene.apache.org
Subject: Re: Are analysers applied to each value in a multi-valued field
separately?
Self-correction, we'd need to set LimitTokenPositionFilterFactor**y to "PI
+ N" to give the results above because of the increment gap between values.
On 16 July 2013 17:16, Daniel Collins <danwcoll...@gmail.com> wrote:
Thanks Jack.
There seem to be a never ending set of FilterFactories, I keep hearing
about new ones all the time :)
Ok, I get it, so our existing code is the first N tokens of each value,
and using LimitTokenPositionFilterFactor**y with the same number would
give us the first N of the combined set of tokens, that's good to know.
On 16 July 2013 14:15, Jack Krupansky <j...@basetechnology.com> wrote:
Yes, each input value is analyzed separately. Solr passes each input
value to Lucene and then Lucene analyzes each.
You could use LimitTokenPositionFilterFactor**y which uses the absolute
token position - each successive analyzed value would have an incremented
position, plus the positionIncrementGap (typically 100 for text.)
-- Jack Krupansky
-----Original Message----- From: Daniel Collins
Sent: Tuesday, July 16, 2013 8:46 AM
To: solr-user@lucene.apache.org
Subject: Are analysers applied to each value in a multi-valued field
separately?
I'm guessing the answer is yes, but here's the background.
We index 2 separate fields, headline and body text for a document, and
then
we want to identify the "top" of the story which is th headline + N words
of the body (we want to weight that in scoring).
So do to that:
<copyField src="headline" dest="top"/>
<copyField src="body" dest="top"/>
And the "top" field has a LimitTokenCountFilterFactory appended to it to
do
the limiting.
<filter class="solr.**LimitTokenCountFilterFactory"
maxTokenCount="N"/>
I realised that top needs to be multi-valued, which got me thinking: is
that N tokens PER VALUE of top or N tokens in total within the top
field...
The field is indexed but not stored, so its hard to determine exactly
which is being done.
Logically, I presume each value in the field is independent (and Solr
then
just matches searches against each one), so that would suggest N is per
value?
Cheers, Daniel