Hi Jens,

I just want to confirm your information. As you said, the query gets slower the 
larger start is, even using filters. The best solution is to get all ids first 
(may take some time), and then to get each documents by id successively. There 
is a request handler (get) and a Java API method (HttpSolrClient.getById()) to 
do so.

Thanks to your help, I have a constantly fast queries, now.

Cheers,
Armin

-----Ursprüngliche Nachricht-----
Von: j...@grivolla.net [mailto:j...@grivolla.net] Im Auftrag von Jens Grivolla
Gesendet: Dienstag, 16. August 2016 13:34
An: user@uima.apache.org
Betreff: Re: CPE memory usage

Solr is known not to be very good at deep paging, but rather getting the
top relevant results. Running a query asking for the millionth document is
pretty much the worst you can do as it will have to rank all documents
again, up to the millionth, and return that one. It can also be unreliable
if your document collection changes.

We did get it to work quite well, though. I believe we used only filters
and retrieved the results in natural order, so that Solr wouldn't have to
rank the documents. We also had a version where we first retrieved all
matching document ids in one go, and then queried for the documents by id,
one by one, in getNext().

Deep paging has also seen some major improvements over time IIRC, so newer
Solr versions should perform much better than the ones from a few years ago.

Best,
Jens

On Tue, Aug 9, 2016 at 12:20 PM, <armin.weg...@bka.bund.de> wrote:

> Hi!
>
> Finally, it looks like that Solr causes the high memory consumption. The
> SolrClient isn't expected to be used like I did it. But it isn't documented
> either. The Solr documentation is very bad. I just happened to find a
> solution on the web by accident.
>
> Thanks,
> Armin
>
> -----Ursprüngliche Nachricht-----
> Von: Richard Eckart de Castilho [mailto:r...@apache.org]
> Gesendet: Montag, 8. August 2016 15:33
> An: user@uima.apache.org
> Betreff: Re: CPE memory usage
>
> Do you have code for a minimal test case?
>
> Cheers,
>
> -- Richard
>
> > On 08.08.2016, at 15:31, <armin.weg...@bka.bund.de> <
> armin.weg...@bka.bund.de> wrote:
> >
> > Hi Richard!
> >
> > I've changed the document reader to a kind of no-op-reader, that always
> sets the document text to an empty string: same behavior, but much slower
> increase in memory usage.
> >
> > Cheers,
> > Armin
>
>

Attachment: pgpyhH2I7EOgG.pgp
Description: PGP signature

Reply via email to