I updated the docs for now. But I agree this paging issue needs to be handled transparently. Feel free to create a jira issue for this or I can create one when I have time to start looking into it.
Joel Bernstein Search Engineer at Heliosearch On Tue, Aug 5, 2014 at 12:04 PM, Adair Kovac <adairko...@gmail.com> wrote: > Thanks, great explanation! Yeah, if it keeps the current behavior added > documentation would be great. > > Are there any other features that expect parameters to change as one > pages? If not I'm concerned that it might be hard to support for clients > that assume only the index params will change. It also makes it harder to > work if we want to add re-ranking on a strict small set of results on the > first page, because then we'd have to stitch together two result sets. We > don't currently want to do that, though. > > For what it's worth, what my colleague who linked me the feature and I > both assumed the behavior would be is that it would get all the results and > return the ones past the re-ranking point as-is. Is that possible? > > Thanks, > > Adair > > > > > On Tue, Aug 5, 2014 at 5:53 AM, Joel Bernstein <joels...@gmail.com> wrote: > >> The comment in the code reads slightly different: >> >> // This enusres that reRankDocs >= docs needed to satisfy the result set. >> reRankDocs = Math.max(start+rows, reRankDocs); >> >> I think you're right though that this is confusing. The way the >> ReRankingQParserPlugin works is that it grabs the top X documents >> (reRankDocs) and reRanks them. If the top X (reRankDocs) isn't large enough >> to satisfy the page then the result won't have enough documents. >> >> The intended use of this was actually to stop using query re-ranking when >> you paged past the reRanked results. So if you re-rank the top 200 >> documents, you would drop the re-ranking parameter when you page to >> documents 201-220. >> >> So the line: >> reRankDocs = Math.max(start+rows, reRankDocs); >> >> Saves you from an unexpected shortfall in documents if you do page beyond >> the reRankDocs. At the very least the expected use should be documented and >> if we can figure out better behavior here that would be great. >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> Joel Bernstein >> Search Engineer at Heliosearch >> >> >> On Mon, Aug 4, 2014 at 7:56 PM, Adair Kovac <adairko...@gmail.com> wrote: >> >>> Looking at this line in the code: >>> >>> // This enusres that reRankDocs <= docs needed to satisfy the result set. >>> reRankDocs = Math.max(start+rows, reRankDocs); >>> >>> This looks like it would cause skips and duplicates while paging through >>> the results, since if you exceed the reRankDocs parameter and keep finding >>> things that match the re-ranking query, they'll get boosted earlier >>> (skipped), thus pushing down items you already saw (causing duplicates). >>> >>> It's obviously intentional behavior, but there's no documentation I can >>> see of why, if you request fewer documents to be re-ranked than you're >>> asking to view, it goes ahead and ignores the number you asked for. What if >>> I only want the top 10 out of 50 rows to be reranked? Wouldn't it be better >>> to make the client choose whether to increase the reRankDocs or leave it >>> the same? >>> >>> If no one replies and I have time, I might check out 4.9 and see if I >>> can confirm or disprove the bug, but figured I'd bring it up now in case I >>> don't end up having time. It would be good to document the reason for this >>> behavior if it turns out it's necessary. >>> >>> Thanks. I'm excited about this feature btw. >>> >>> --Adair >>> >> >> >