Hi Daniel,

How about trying something like this (you'll have to play with the boosts to 
tune this), search all the fields with all the terms using edismax and use the 
minimum should match parameter, but require all terms to match in the 
allMetadata field.    
https://wiki.apache.org/solr/ExtendedDisMax#mm_.28Minimum_.27Should.27_Match.29

Lucene query syntax below to give you the general idea, but this query would 
require all terms to be in one of the metadata fields to get the boost.

metadata1:(term1 AND ... AND termN)^2
metadata2:(term1 AND ... AND termN)^2
.....
metadataN:(term1 AND ... AND termN)^2
allMetadatas :(term1 AND ... AND termN)^0.5

That should do approximately what you want,
Robi

-----Original Message-----
From: Daniel Shane [mailto:sha...@lexum.com] 
Sent: Tuesday, January 21, 2014 8:42 AM
To: solr-user@lucene.apache.org
Subject: Interesting search question! How to match documents based on the least 
number of fields that match all query terms?

I have an interesting solr/lucene question and its quite possible that some new 
features in solr might make this much easier that what I am about to try. If 
anyone has a clever idea on how to do this search, please let me know!

Basically, lets state that I have an index in which each documents has a 
content and several metadata fields.

Document Fields:

content
metadata1
metadata2
.....
metadataN
allMetadatas (all the terms indexed in metadata1...N are concatenated in this 
field) 

Assuming that I am searching for documents that contains a certain number of 
terms (term1 to termN) in their metadata fields, I would like to build a search 
query that will return document that satisfy these requirement:

a) All search terms must be present in a metadata field. This is quite easy, we 
can simply search in the field allMetadatas and that will work fine.

b) Now for the hard part, we prefer document in which we found the metadatas in 
the *least number of different fields*. So if one document contains all the 
search terms in 10 different fields, but another document contains all search 
terms but in only 8 fields, we would like those to sort first. 

My first idea was to index terms in the allMetadatas using payloads. Each 
indexed term would also have the specific metadataN field from which they 
originate. Then I can write a scorer to score based on these payloads. 

However, if there is a way to do this without payloads I'm all ears!

-- 
Daniel Shane
Lexum (www.lexum.com)
sha...@lexum.com

Reply via email to