On 5/17/2011 7:07 PM, Markus Jelsma wrote:
This is propably due to the contents (HTML bodies) of the documents i've
queried. It's not so strange for this type of document to return less
documents when two negated operands are specified. In my case (i tested it) a
conjunction returned the same documents as a disjunction did.
Again, i haven't done extensive testing on this subject.
I think we're disagreeing on what the proper behavior is. Because my
understanding of the proper behavior under boolean logic, it doesn't
matter the contents of your documents, it is logically impossible.
Perhaps I am wrong to expect that lucene's pseudo-boolean operators will
behave like actual boolean logic?
under boolean logic -- assuming "-one" means the same thing as "NOT one"
-- both mean "all documents that do not have 'one'", right? :
-one OR -two === (NOT one) OR (NOT two) === NOT ( one AND two )
And it is logically impossible for that query to return FEWER results
than "-one" alone does, or than "-two" alone does, in ANY corpus. It
can return the same #, or it can return more. You can never get fewer
documents by adding an "OR" union on, right? That's a set union, union
of set A with some other (possibly empty) set B can never have fewer
members than set A alone!
In fact, playing around more and comparing hit counts, it looks like
Solr 1.4.1 lucene query parser treats:
"-one OR -two"
the same as
NOT (one OR two)
Which is not/should not be the same query at all.
The first is "all documents that don't have 'one' COMBINED WITH all
documents that don't have 'two'". The second is "all documents that
have NEITHER 'one' NOR 'two'". Those are two different things, or
ought to be.
Or am I wrong to think that? That is certainly the way boolean algebra
works; if "-one" is a boolean negation the same as "NOT one". Then "-one
OR -two" definitely ought _not_ to be the same query as "NOT (one OR
two)". But maybe I should not be expecting predictable boolean algebra
here? But if that's the case then I'm not sure what behavior I should be
expecting, what the expected predictable behavior of these operators is!
If we want to make things even more confusing, I can supply some other
patterns involving an explicit "NOT" that also don't work how I expect
or according to any predictable way I can figure out.