Mark Davis scripsit: > We use this structure in ICU (in UnicodeSet). For a high-level explanation, > see my site (www.macchiato.com), "Bits of Unicode".
Inversion lists, yes. (Different terminology, same thing). > As to the binary search, we have used in various contexts before a > "completely unrolled" binary loop, following John Bentley's formulation. Ah, that's a different kind. I was talking about recursive unrolling, with the content of the data structure appearing as constants in the code. Naturally, this is barbarian-style programming, should really be done with a program generator, and if not, is suitable only for circumstances where the values never change. But consider the inversion list for Latin letters: 0041 005B 0222 0234 0061 007B 1E00 1E9C 00C0 00D7 1EA0 1EFA 00D8 00F7 FF21 FF3B 00F8 0220 FF41 FF5B you can unwrap the binary search as: if (u < 0x0220) { if (u < 0x00C0) { if (u < 0x0061) { if (u < 0x005B) return !(u < 0x0041); else return false; } else return u < 0x007B; } else { if (u < 0x00F7) { if (u < 0x00D8) return u < 0x00D7; else return false; } else return !(u < 0x00F8); } else { ... # and so on for the other half If the list is too long, this code also suffers because it overflows the instruction cache. -- Said Agatha Christie / To E. Philips Oppenheim John Cowan "Who is this Hemingway? / Who is this Proust? [EMAIL PROTECTED] Who is this Vladimir / Whatchamacallum, http://www.reutershealth.com This neopostrealist / Rabble?" she groused. http://www.ccil.org/cowan --author unknown to me; any suggestions?