Re: Influencing scores on values in multiValue fields

Jonathan Rochkind Wed, 03 Nov 2010 08:09:09 -0700

Be careful of multi-term queries and String types. By multi-term here,I mean multi-term according to the 'pre-tokenization' that dismax andstandard parsers do -- basically on whitespace. If you have a stringwith whitespace as a single (non-tokenized field) in a Solr String type,and you have a q that is that identical string (with whitespace, but NOTenclosed in phrase quotes) -- it still won't match. Because of thepre-tokenization-on-whitespace that the query parsers do.

It WILL still match if you put the q in double quotes for a phrase. Andit WILL still match for a dismax pf phrase boost. But it will not matcha dismax qf field, or a standard query parser fielded q search.

This makes this approach to solving the problem not always do what you'dlike. I haven't figured out a better one though. With dismax, if youinclude it both as a boosted field in qf (which will match onsingle-term queries, but not on queries with whitespace) AND as aboosted field in pf (which will match on queries with whitespace, butwont' be used at all for queries without whitespace, as dismax doesn'teven bring the pf into play unless the pre-tokenization comes up withmore than one term) -- it seems to mostly do what you'd want. Analternate strategy might be trying to use it as a dismax bq query, sinceyou can tell bq to use an alternate query parser (for example !field or!raw) that won't do the pre-tokenization.


Imran wrote:

Thanks Mike for your suggestion. It did take me down the correct route. I
basically created another multiValue field of type 'string' and boosted
that. To get the partial matches to avoid the length normalisation I had the
'text' type multiValue field to omitNorms. The results look as per expected
so far on this configuration.

Cheers
-- Imran

On Fri, Oct 29, 2010 at 1:09 PM, Michael Sokolov <soko...@ifactory.com>wrote:

How about creating another field for doing exact matches (a string);
searching both and boosting the string match?

-Mike

-----Original Message-----
From: Imran [mailto:imranboho...@gmail.com]
Sent: Friday, October 29, 2010 6:25 AM
To: solr-user@lucene.apache.org
Subject: Influencing scores on values in multiValue fields

Hi All

We've got an index in which we have a multiValued field per document.

Assume the multivalue field values in each document to be;

Doc1:
bar lifters

Doc2:
truck tires
back drops
bar lifters

Doc 3:
iron bar lifters

Doc 4:
brass bar lifters
iron bar lifters
tire something
truck something
oil gas

Now when we search for 'bar lifters' the expectation (based on the
requirements) is that we get results in the order of Doc1,
Doc 2, Doc4 and Doc3.
Doc 1 - since there's an exact match (and only one) for the
search terms Doc 2 - since ther'e an exact match amongst the
values Doc 4 - since there's a partial match on the values
but the number of matches are more than Doc 3 Doc 3 - since
there's a partial match

However, the results come out as Doc1, Doc3, Doc2, Doc4.
Looking at the explaination of the result it appears Doc 2 is
loosing to Doc3 and Doc 4 is loosing to Doc3 based on length
normalisation.

We think we can see the reason for that - the field length in
doc2 is greater than doc3 and doc 4 is greater doc3.
However, is there any mechanism I can force doc2 to beat doc3
and doc4 to beat doc3 with this structure.

We did look at using omitNorms=true, but that messes up the
scores for all docs. The result comes out as Doc4, Doc1,
Doc2, Doc3 (where Doc1, Doc2 and
Doc3 gets the same score)
This is because the fieldNorm is not taken into account anymore (as
expected) and the termFrequence being the only contributing
factor. So trying to avoid length normalisation through
omitNorms is not helping.

Is there anyway where we can influence an exact match of a
value in a multiValue field to add on to the overall score
whilst keeping the lenght normalisation?

Hope that makes sense.

Cheers
-- Imran

Re: Influencing scores on values in multiValue fields

Reply via email to