On 11/12/06, Michael Imbeault <[EMAIL PROTECTED]> wrote:
- Somewhat related : Let's say I index "Polymyxin B". If I stopword
single letters, would a phrase search ("Polymyxin B") still find the
right documents (I don't think so, but still)? If not, I'll have to
index single letters; how do I prevent the same problem as in the first
question (i.e., a search on Polymyxin B yielding documents with
Polymyxin and B, but not close to one another).

The general problem seems that you can tell what should be in a phrase
search and what shouldn't

You could try throwing everything in a sloppy phrase query, so at
least scores will go up when terms are closer together (in general).

You could also try an exact phrase query, and if you don't get enough
results, follow it up with another strategy (like what you have
below).

My thought is to parse the user query and rephrase it to do phrase
searches on nearby terms containing single letters / numbers. If an user
search for HIV 1 hepatitis, I'd rewrite it as ("HIV 1" AND hepatitis) OR
("1 hepatitis" AND hiv). Is it a sensible solution?

That might work.
Whatever general strategy you end up trying, you can probably boost
relevancy with some domain specific knowledge injected with something
like the SynonymFilter.

-Yonik

Reply via email to