You can also have a sliding re-ranking horizon. That is how we did it in Ultraseek.
http://observer.wunderwood.org/2007/04/04/progressive-reranking/ wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ On Aug 5, 2014, at 9:38 AM, Joel Bernstein <joels...@gmail.com> wrote: > I updated the docs for now. But I agree this paging issue needs to be > handled transparently. Feel free to create a jira issue for this or I can > create one when I have time to start looking into it. > > Joel Bernstein > Search Engineer at Heliosearch > > > On Tue, Aug 5, 2014 at 12:04 PM, Adair Kovac <adairko...@gmail.com> wrote: > >> Thanks, great explanation! Yeah, if it keeps the current behavior added >> documentation would be great. >> >> Are there any other features that expect parameters to change as one >> pages? If not I'm concerned that it might be hard to support for clients >> that assume only the index params will change. It also makes it harder to >> work if we want to add re-ranking on a strict small set of results on the >> first page, because then we'd have to stitch together two result sets. We >> don't currently want to do that, though. >> >> For what it's worth, what my colleague who linked me the feature and I >> both assumed the behavior would be is that it would get all the results and >> return the ones past the re-ranking point as-is. Is that possible? >> >> Thanks, >> >> Adair >> >> >> >> >> On Tue, Aug 5, 2014 at 5:53 AM, Joel Bernstein <joels...@gmail.com> wrote: >> >>> The comment in the code reads slightly different: >>> >>> // This enusres that reRankDocs >= docs needed to satisfy the result set. >>> reRankDocs = Math.max(start+rows, reRankDocs); >>> >>> I think you're right though that this is confusing. The way the >>> ReRankingQParserPlugin works is that it grabs the top X documents >>> (reRankDocs) and reRanks them. If the top X (reRankDocs) isn't large enough >>> to satisfy the page then the result won't have enough documents. >>> >>> The intended use of this was actually to stop using query re-ranking when >>> you paged past the reRanked results. So if you re-rank the top 200 >>> documents, you would drop the re-ranking parameter when you page to >>> documents 201-220. >>> >>> So the line: >>> reRankDocs = Math.max(start+rows, reRankDocs); >>> >>> Saves you from an unexpected shortfall in documents if you do page beyond >>> the reRankDocs. At the very least the expected use should be documented and >>> if we can figure out better behavior here that would be great. >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> Joel Bernstein >>> Search Engineer at Heliosearch >>> >>> >>> On Mon, Aug 4, 2014 at 7:56 PM, Adair Kovac <adairko...@gmail.com> wrote: >>> >>>> Looking at this line in the code: >>>> >>>> // This enusres that reRankDocs <= docs needed to satisfy the result set. >>>> reRankDocs = Math.max(start+rows, reRankDocs); >>>> >>>> This looks like it would cause skips and duplicates while paging through >>>> the results, since if you exceed the reRankDocs parameter and keep finding >>>> things that match the re-ranking query, they'll get boosted earlier >>>> (skipped), thus pushing down items you already saw (causing duplicates). >>>> >>>> It's obviously intentional behavior, but there's no documentation I can >>>> see of why, if you request fewer documents to be re-ranked than you're >>>> asking to view, it goes ahead and ignores the number you asked for. What if >>>> I only want the top 10 out of 50 rows to be reranked? Wouldn't it be better >>>> to make the client choose whether to increase the reRankDocs or leave it >>>> the same? >>>> >>>> If no one replies and I have time, I might check out 4.9 and see if I >>>> can confirm or disprove the bug, but figured I'd bring it up now in case I >>>> don't end up having time. It would be good to document the reason for this >>>> behavior if it turns out it's necessary. >>>> >>>> Thanks. I'm excited about this feature btw. >>>> >>>> --Adair >>>> >>> >>> >>