
Several of us are wondering about the reason for reserving bits for the extended UTF-8 in perl5. I'm asking you because you are the apparent author of the commits that did this.

To refresh your memory, in perl5 UTF-8, a start byte of 0xFF causes the length of the sequence of bytes that comprise a single character to be 13 bytes. This allows code points up to 2**72 - 1 to be represented. If the length had been instead 12 bytes, code points up to 2**66 - 1 could be represented, which is enough to represent any code point possible in a 64-bit word.

The comments indicate that these extra bits are "reserved". So we're wondering what potential use you had thought of for these bits.


Karl Williamson

Reply via email to