Re: edismax using bigrams instead of phrases?

Bill Dueber Fri, 04 Dec 2009 09:04:37 -0800

I see that edismax already defines pf (bigrams) and pf3 (trigrams) -- how
would folks think about just calling them pf / pf1 (aliases for each
other?), pf2, and pf3? The pf would then behave exactly as it does in
dismax.


And it sounds like the solution to my single-token fields is to just move
them into the query itself.

Thanks!

On Fri, Dec 4, 2009 at 11:58 AM, Yonik Seeley <yo...@lucidimagination.com>wrote:

> On Fri, Dec 4, 2009 at 11:26 AM, Bill Dueber <b...@dueber.com> wrote:
> > I've started trying edismax, and have noticed that my relevancy ranking
> is
> > messed up with edismax because, according to the debug output, it's using
> > bigrams instead of phrases and inexplicably ignoring a couple of the pf
> > fields. While the hit count isn't changing,  this kills my ability to
> boost
> > exact title matches (or, I would guess, exact-anything-else matches,
> too).
>
> It's a feature in general - the problem with putting all the terms in
> a single phrase query is that you get no boosting at all if all of the
> terms don't appear.
>
> But since it may be useful as an option, perhaps we should add the
> single-phrase option to extended dismax as well.
>
> > edismax is also completely ignoring the title_a and title_ab fields,
> which
> > are defined as "exactmatcher" as follows.
>
> I believe this is because extended dismax only adds phrases for
> boosting... hence if a field type outputs a single token, it's
> considered redundant with the main query.  This is an optimization to
> speed up queries (esp single-word queries).
> Perhaps one way to fix this would be to check if the pf is in the qf
> list before removing single term phrases?
>
> -Yonik
> http://www.lucidimagination.com
>



-- 
Bill Dueber
Library Systems Programmer
University of Michigan Library

Re: edismax using bigrams instead of phrases?

Reply via email to