Re: Sorting with little memory: A suggestion

2010-03-19 Thread Robert Muir
On Fri, Mar 19, 2010 at 11:06 AM, Toke Eskildsen t...@statsbiblioteket.dk wrote: To me, the trade-offs seems to be new Sort(new SortField(field, locale)) Toke, only partially-on-topic here, is it possible to describe your use-case a little more where its preferable to use this Locale-based

Re: Sorting with little memory: A suggestion

2010-03-19 Thread Michael McCandless
If you build the ords per-segment, how do you compare results across segments? Ie, in the non-Collator case, Lucene stores ords but must also store the actual String so that the FieldComparator is able to compare results across segments Mike On Fri, Mar 19, 2010 at 10:06 AM, Toke Eskildsen

RE: Sorting with little memory: A suggestion

2010-03-19 Thread Toke Eskildsen
From: Robert Muir [rcm...@gmail.com]: Toke, only partially-on-topic here, is it possible to describe your use-case a little more where its preferable to use this Locale-based sort instead of indexing collation keys (e.g. you have to support so many locales this would be too much indexing

Re: Sorting with little memory: A suggestion

2010-03-19 Thread Robert Muir
On Fri, Mar 19, 2010 at 1:46 PM, Toke Eskildsen t...@statsbiblioteket.dk wrote: From: Robert Muir [rcm...@gmail.com]: Toke, only partially-on-topic here, is it possible to describe your use-case a little more where its preferable to use this Locale-based sort instead of indexing collation

Re: Sorting with little memory: A suggestion

2010-03-19 Thread Michael McCandless
On Fri, Mar 19, 2010 at 12:46 PM, Toke Eskildsen t...@statsbiblioteket.dk wrote: However, it is not set in stone that we will shift to using Exposed or similar: As many others we're pursuing real-time indexing and while Exposed sits at the segment-level and thus works well for re-open, big

RE: Sorting with little memory: A suggestion

2010-03-19 Thread Toke Eskildsen
From: Robert Muir [rcm...@gmail.com]: [Toke: Indexing collation keys only helps with the speed problem] I don't really understand this measurement, collation keys are byte[]... (although its true we don't yet encode them this way in flex, I think we should) I sounds like I'm missing

Re: Sorting with little memory: A suggestion

2010-03-19 Thread Robert Muir
On Fri, Mar 19, 2010 at 5:42 PM, Toke Eskildsen t...@statsbiblioteket.dk wrote: I sounds like I'm missing something here... A quick check of running 2 random Strings of 30 characters from a-zA-Z0-1 + 20 different national characters through Java's Collator returned an average

RE: Sorting with little memory: A suggestion

2010-03-19 Thread Toke Eskildsen
From: Robert Muir [rcm...@gmail.com]: Right, JDK collation sucks, use the ICU for collation keys too: http://site.icu-project.org/charts/collation-icu4j-sun at 1.59 bytes/char, thats less than UTF-16 Ah... I should have seen that. I does not change the overall picture though: Althought the

Re: Sorting with little memory: A suggestion

2010-03-19 Thread Robert Muir
On Fri, Mar 19, 2010 at 6:07 PM, Toke Eskildsen t...@statsbiblioteket.dk wrote: Ah... I should have seen that. I does not change the overall picture though: Althought the ICU collation keys are impressively small, they still take up nearly as much space as the original Strings when they