On Fri, Mar 19, 2010 at 11:06 AM, Toke Eskildsen t...@statsbiblioteket.dk
wrote:
To me, the trade-offs seems to be
new Sort(new SortField(field, locale))
Toke, only partially-on-topic here, is it possible to describe your
use-case a little more where its preferable to use this Locale-based
If you build the ords per-segment, how do you compare results across segments?
Ie, in the non-Collator case, Lucene stores ords but must also store
the actual String so that the FieldComparator is able to compare
results across segments
Mike
On Fri, Mar 19, 2010 at 10:06 AM, Toke Eskildsen
From: Robert Muir [rcm...@gmail.com]:
Toke, only partially-on-topic here, is it possible to describe your
use-case a little more where its preferable to use this Locale-based
sort instead of indexing collation keys (e.g. you have to support so
many locales this would be too much indexing
On Fri, Mar 19, 2010 at 1:46 PM, Toke Eskildsen t...@statsbiblioteket.dk
wrote:
From: Robert Muir [rcm...@gmail.com]:
Toke, only partially-on-topic here, is it possible to describe your
use-case a little more where its preferable to use this Locale-based
sort instead of indexing collation
On Fri, Mar 19, 2010 at 12:46 PM, Toke Eskildsen t...@statsbiblioteket.dk
wrote:
However, it is not set in stone that we will shift to using Exposed or
similar: As many others we're pursuing real-time indexing and while Exposed
sits at the segment-level and thus works well for re-open, big
From: Robert Muir [rcm...@gmail.com]:
[Toke: Indexing collation keys only helps with the speed problem]
I don't really understand this measurement, collation keys are
byte[]... (although its true we don't yet encode them this way in
flex, I think we should)
I sounds like I'm missing
On Fri, Mar 19, 2010 at 5:42 PM, Toke Eskildsen t...@statsbiblioteket.dk
wrote:
I sounds like I'm missing something here... A quick check of running 2
random Strings of 30 characters from a-zA-Z0-1 + 20 different national
characters through Java's Collator returned an average
From: Robert Muir [rcm...@gmail.com]:
Right, JDK collation sucks, use the ICU for collation keys too:
http://site.icu-project.org/charts/collation-icu4j-sun
at 1.59 bytes/char, thats less than UTF-16
Ah... I should have seen that. I does not change the overall picture though:
Althought the
On Fri, Mar 19, 2010 at 6:07 PM, Toke Eskildsen t...@statsbiblioteket.dk
wrote:
Ah... I should have seen that. I does not change the overall picture though:
Althought the ICU collation keys are impressively small, they still take up
nearly as much space as the original Strings when they