On 5/17/2011 7:07 PM, Markus Jelsma wrote:

This is propably due to the contents (HTML bodies) of the documents i've
queried. It's not so strange for this type of document to return less
documents when two negated operands are specified. In my case (i tested it) a
conjunction returned the same documents as a disjunction did.

Again, i haven't done extensive testing on this subject.

I think we're disagreeing on what the proper behavior is. Because my understanding of the proper behavior under boolean logic, it doesn't matter the contents of your documents, it is logically impossible. Perhaps I am wrong to expect that lucene's pseudo-boolean operators will behave like actual boolean logic?

under boolean logic -- assuming "-one" means the same thing as "NOT one" -- both mean "all documents that do not have 'one'", right? :

-one OR -two === (NOT one) OR (NOT two) ===  NOT (  one AND two )

And it is logically impossible for that query to return FEWER results than "-one" alone does, or than "-two" alone does, in ANY corpus. It can return the same #, or it can return more. You can never get fewer documents by adding an "OR" union on, right? That's a set union, union of set A with some other (possibly empty) set B can never have fewer members than set A alone!

In fact, playing around more and comparing hit counts, it looks like Solr 1.4.1 lucene query parser treats:

"-one OR -two"
the same as
NOT (one OR two)

Which is not/should not be the same query at all.

The first is "all documents that don't have 'one' COMBINED WITH all documents that don't have 'two'". The second is "all documents that have NEITHER 'one' NOR 'two'". Those are two different things, or ought to be.

Or am I wrong to think that? That is certainly the way boolean algebra works; if "-one" is a boolean negation the same as "NOT one". Then "-one OR -two" definitely ought _not_ to be the same query as "NOT (one OR two)". But maybe I should not be expecting predictable boolean algebra here? But if that's the case then I'm not sure what behavior I should be expecting, what the expected predictable behavior of these operators is!

If we want to make things even more confusing, I can supply some other patterns involving an explicit "NOT" that also don't work how I expect or according to any predictable way I can figure out.

Reply via email to