On Wed, 2012-11-28 at 03:25 +0100, Arun Rangarajan wrote:

[Sorting on 14M docs, 250 fields]

> From what I have read, I understand that restricting the number of distinct
> values on sortable Solr fields will bring down the fieldCache space. The
> values in these sortable fields can be any integer from 0 to 33000 and
> quite widely distributed. We have a few scaling solutions in mind, but what
> is the best way to handle this whole issue?

Since the number of documents exceeds to maximum value in your fields,
the lowest memory consumption for a fast implementation I can come up
with is #docs * #fields * maxbits/value: 14M * 250 * 16bit ~= 7GB.

So it's time to get creative and hack Solr.

First off, is the number of unique values/field significantly lower than
your maximum of 33000 for a non-trivial amount of your fields? If so,
they can be mapped to a contiguous range when storing the data (this
could be done dynamically when creating the field cache). If an average
field holds < 1024 unique values, the total memory consumption would be
about 14M * 250 * 10bits ~= 4.4GB.

Secondly, if you normally use only a few fields for sorting, which I
suspect is the case, you could compress the values as a single block and
uncompress it when requested from field cache. Having a fixed size cache
of uncompressed values in the field cache should ensure that there is no
slowdown for most requests.

It is very hard to estimate the memory savings on this, but I would not
be surprised if you could reduce memory consumption to 1/10 of the worst
case 7GB, if the values are fairly uniform. Of course, if the values are
all over the place, this gains you nothing at all.

Regards,
Toke Eskildsen

Reply via email to