On Oct 13, 2008, at 9:34 PM, abhishek007 wrote:



Svein Parnas-2 wrote:


One way to boost exact match of one occurrence of a multivalued field
is to add some kind of special start-of-field token and end-of-field
token in the data, eg:

<document>
 <field name="professor">John Dane</field>
 <field name="course">softok Algorithms eoftok</field>
 <field name="course">softok Theory eoftok</field>
 <field name="course">softok Computability, Complexity and Algorithms
eoftok</field>
</document>

Then, in your query you can boost hits with the complete phrase
"softok queryword eoftok" by doing something like

queryword OR "softok queryword eoftok"^10



I see what you are saying, but what if the query string itself contains multiple synonyms, for example something like "Algorithms, Theory". With this I would end up having "softok Algorithms, Theory eoftok" which would
not match the indexed data.

I was just trying to point you in a direction, not giving a complete solution. For multiword queries, the solution will depend on the query syntax you are going to support and how you want the ranking to be performed. For instance, if the interpretation of a simple two word query would be: "Both words required, boost short field occurrences before long but sort those hits where both words occure in the same field occurrence first", the query could be rewritten to

+"softok wordA eoftok"~<x> +"softok wordB eoftok"~<x> "wordA wordB"~<x>^50

where <x> is about the number of tokens in the longest occurrence of the field in the index, but less than the fieldĀ“s positionincrementgap.

The query parsing might get a bit messy if you are going to support advanced syntax. If the syntax you are going to support is about the same as DisMax, it could be an idea to modify DisMaxRequestHandler. Another way to go would be to use DisMax as is, find all query terms not prefixed with - in the query and add "softok word eoftok"~<x> to the bq parameter.

Svein

Reply via email to