> > I think you can get what you want by doing the first stage  retrieval,
> > and then in the second stage, add required constraint(s) to the query
> > for the matching docid(s), and change the AND operators in the
> > original query to OR.  Coordination will cause the best snippet(s) to
> > rise to the top,  no?
> 
> Right, right.
> So if the original query is: foo AND bar, I'd run it against the main
> index, get top N hits, say N=10.
> Then I'd create another query: +(foo OR bar) +articleID:(ORed list of top
> N article IDs from main results)
> And then I'd use that to get enough "sentence docs" to have at least 1 of
> them for each hit from the main index.
> 
> Hm, I wonder what happens when instead of simple foo AND bar you have a
> more complex query with more elaborate grouping and such...

:) I was hoping that you could limit the query language to exclude grouping...  
If not, you could walk the boolean query, trim all clauses that are PROHIBITED, 
then flatten all of the remaining terms to a single OR'd query?

> > Hmm, you'll want to run the second stage once for each hit from the
> > first stage, though, unless you can afford to collect *all* hits and pull
> > out each first stage's hit from the intermixed second stage  results...
> 
> Wouldn't the above get me all sentences I need for top N hits from the
> main result in a single shot, assuming I use high enough rows=NNN to
> minimize the possibility of not getting even 1 sentence for any one of
> those top N hits?

Yes, but the problem is that the worst case is that you have to retrieve *all* 
second-stage hits to get at least one for each of the first-stage hits.  So if 
you're okay with NNN = numDocs, then no problem.

Steve

Reply via email to