Hi Andrew, When you request a sort on a field, Lucene stores every unique value in a field cache, which stays in ram. If you have a large index and you're sorting on a Unicode string field, this can be very memory intensive. The way that I've solved this in the past is to make a field specifically for sorting and then truncate the string to a small number of characters and sort on that. You have to accept that in some cases sort order will be wrong. (If you truncate to 6 characters and then sort Thisisastring and Thisisnotastring) you're not guaranteed to get the correct sort order.
The memory benefits to this are two-fold though, you have a shorter string which takes up less memory, and you have a decreased number of unique values. Cody -----Original Message----- From: Andrew Ingram [mailto:andrew.ing...@tangentlabs.co.uk] Sent: Monday, November 21, 2011 3:23 AM To: solr-user@lucene.apache.org Subject: Efficient title sorting on large result sets. Hi everyone, We have a large product catalogue (currently 9 million, but soon to inflate to around 25 million) with each product have a unicode title. We're offering the facility to sort by title, but often within quite large result sets, eg 1 million fiction books (we are correctly using filters). Aside from the obvious questionable use of sorting over such a large set of results, I'm wondering if there's any steps I can take to optimise title sorting and minimise memory use. Solr also crashes with OutOfMemoryErrors every couple of days, could this be related to the sorting by title? Or should I be looking for another cause? The machine Solr is on has 8gb ram, 7 of which is given to Solr. We have other sites with larger catalogues and similar spec hardware that aren't having any issues, the title sorting seems to be the only major difference in functionality. I'll be very grateful for any assistance. Regards, Andy Ingram