deep paging without sorting / keep IRs open

Tommaso Teofili Thu, 15 May 2014 17:48:13 -0700

Hi all,

in one use case I'm working on [1] I am using Solr in combination with a
MVCC system [2][3], so that the (Solr) index is kept up to date with the
system and must handle search requests that are tied to a certain state /
version of it and of course multiple searches based on different versions
of the system have to run together.


So to make an example an indexing request (with commit) creates doc x and
y, a search for all the docs retrieves x and y, then a second indexing
requests (with commit) adds doc z, a search for all the docs retrieves x y
and z; that's fine as soon as the number of results is not big, but if
search requests are paged (with start and rows parameters) then the above
example doesn't work as multiple requests with underlying changing data
would have to be done to get pages.
In the above scenario if rows = 1 then the first request would retrieve 1
doc at a time, with a 'numFound' changed on the second request (from 2 to
3) which would be not consistent.

Basically I need the ability to keep running searches against a specified
commit point / index reader / state of the Lucene / Solr index.
So I wonder if a similar thing like the one done for "cursorMark" can be
done in order to address that, of course such "long running IndexReaders"
would have to be disposed after some time.

WDYT?
Regards,
Tommaso

[1] : http://jackrabbit.apache.org/oak
[2] : http://en.wikipedia.org/wiki/Multiversion_concurrency_control
[3] :
http://wiki.apache.org/jackrabbit/RepositoryMicroKernel?action=AttachFile&do=view&target=MicroKernel+Revision+Model.pdf

deep paging without sorting / keep IRs open

Reply via email to