On 11/12/06, Michael Imbeault <[EMAIL PROTECTED]> wrote:
- Somewhat related : Let's say I index "Polymyxin B". If I stopword single letters, would a phrase search ("Polymyxin B") still find the right documents (I don't think so, but still)? If not, I'll have to index single letters; how do I prevent the same problem as in the first question (i.e., a search on Polymyxin B yielding documents with Polymyxin and B, but not close to one another).
The general problem seems that you can tell what should be in a phrase search and what shouldn't You could try throwing everything in a sloppy phrase query, so at least scores will go up when terms are closer together (in general). You could also try an exact phrase query, and if you don't get enough results, follow it up with another strategy (like what you have below).
My thought is to parse the user query and rephrase it to do phrase searches on nearby terms containing single letters / numbers. If an user search for HIV 1 hepatitis, I'd rewrite it as ("HIV 1" AND hepatitis) OR ("1 hepatitis" AND hiv). Is it a sensible solution?
That might work. Whatever general strategy you end up trying, you can probably boost relevancy with some domain specific knowledge injected with something like the SynonymFilter. -Yonik