RE: Efficient title sorting on large result sets.

Young, Cody Mon, 21 Nov 2011 09:43:53 -0800

Hi Andrew,

When you request a sort on a field, Lucene stores every unique value in
a field cache, which stays in ram. If you have a large index and you're
sorting on a Unicode string field, this can be very memory intensive.
The way that I've solved this in the past is to make a field
specifically for sorting and then truncate the string to a small number
of characters and sort on that. You have to accept that in some cases
sort order will be wrong. (If you truncate to 6 characters and then sort
Thisisastring and Thisisnotastring) you're not guaranteed to get the
correct sort order.


The memory benefits to this are two-fold though, you have a shorter
string which takes up less memory, and you have a decreased number of
unique values.

Cody

-----Original Message-----
From: Andrew Ingram [mailto:andrew.ing...@tangentlabs.co.uk] 
Sent: Monday, November 21, 2011 3:23 AM
To: solr-user@lucene.apache.org
Subject: Efficient title sorting on large result sets.

Hi everyone,

We have a large product catalogue (currently 9 million, but soon to
inflate to around 25 million) with each product have a unicode title.
We're offering the facility to sort by title, but often within quite
large result sets, eg 1 million fiction books (we are correctly using
filters). Aside from the obvious questionable use of sorting over such a
large set of results, I'm wondering if there's any steps I can take to
optimise title sorting and minimise memory use.

Solr also crashes with OutOfMemoryErrors every couple of days, could
this be related to the sorting by title? Or should I be looking for
another cause? The machine Solr is on has 8gb ram, 7 of which is given
to Solr. We have other sites with larger catalogues and similar spec
hardware that aren't having any issues, the title sorting seems to be
the only major difference in functionality.

I'll be very grateful for any assistance.

Regards,
Andy Ingram

RE: Efficient title sorting on large result sets.

Reply via email to