Mark Davis scripsit:

> We use this structure in ICU (in UnicodeSet). For a high-level explanation,
> see my site (www.macchiato.com), "Bits of Unicode".

Inversion lists, yes.  (Different terminology, same thing).

> As to the binary search, we have used in various contexts before a
> "completely unrolled" binary loop, following John Bentley's formulation. 

Ah, that's a different kind.  I was talking about recursive unrolling, with
the content of the data structure appearing as constants in the code.
Naturally, this is barbarian-style programming, should really be done with a
program generator, and if not, is suitable only for circumstances where the
values never change.  But consider the inversion list for Latin letters:

        0041 005B                       0222 0234
        0061 007B                       1E00 1E9C
        00C0 00D7                       1EA0 1EFA
        00D8 00F7                       FF21 FF3B
        00F8 0220                       FF41 FF5B

you can unwrap the binary search as:

        if (u < 0x0220) {
            if (u < 0x00C0) {
                if (u < 0x0061) {
                    if (u < 0x005B) return !(u < 0x0041);
                    else return false;
                    }
                else return u < 0x007B;
                }
            else {
                if (u < 0x00F7) {
                    if (u < 0x00D8) return u < 0x00D7;
                    else return false;
                    }
                else return !(u < 0x00F8);
                }
        else { ...              # and so on for the other half

If the list is too long, this code also suffers because it overflows the
instruction cache.

-- 
Said Agatha Christie / To E. Philips Oppenheim  John Cowan
"Who is this Hemingway? / Who is this Proust?   [EMAIL PROTECTED]
Who is this Vladimir / Whatchamacallum,         http://www.reutershealth.com
This neopostrealist / Rabble?" she groused.     http://www.ccil.org/cowan
        --author unknown to me; any suggestions?

Reply via email to