Re: Sort suggestion

Mark Miller Tue, 29 Jul 2008 12:18:32 -0700

I think you'll find it slow to add disk seeks in the sort on eachsearch. Something you might be able to work from though (though I doubtit still applys cleanly) is Hoss' issuehttps://issues.apache.org/jira/browse/LUCENE-831. This allows for apluggable cache implementation for sorting. Also allows for much fasterreopening in most cases - hasn't seen any activity, and I think they arelooking to get the reopen gains elsewhere, but it may be worth playing with.


- Mark


Marcus Herou wrote:

Guys.

I've noticed many having trouble with sorting and OOM. Eventually they solve
it by throwing more memory at the problem.

Should'nt a solution which can sort on disk when neccessary be implemented
in core Lucene ?
Something like this:
http://www.codeodor.com/index.cfm/2007/5/10/Sorting-really-BIG-files/1194

Since you obviously know the result size you can calculate how much memory
is needed for the sort and if the calculated value s higher then a
configurable threshold an external on disk sort is performed and perhaps a
logging message which states something on a WARN level.

Just a thought since I'm about to implement something which could sort any
Comparable object but on disk.

Guess the Hadoop project have the perfect tools for this since everything
the mapred inputfiles are sorted, on disk and huge.

Kindly

//Marcus

Re: Sort suggestion

Reply via email to