All this talk about these higher-plane characters - you know, plane 1 and above; let's call them MathText characters for short - has got me wondering.
Why is there no UTF-24? See, these MathText characters take up a lot of space. No matter how you encode them; UTF-8, UTF-16 or UTF-32; they always are 4 bytes long. Now if we had UTF-24, they would only take up 3 bytes. And since the Unicode character range is formally defined to run no higher than U+10FFFD, which fits in 3 bytes, I see no reason why no-one has ever gone to the trouble of defining a 3-byte storage method. Implementation would be easy; there would be only two variants, UTF-24LE and UTF-24BE, and that's it. No juggling with bits like in UTF-8 and UTF-16 or anything complicated like that. Just the plain character values, just like in UTF-32, only with 75% of the storage needed. Comments anyone? Pim Blokland