Hi,

Several of us are wondering about the reason for reserving bits for the extended UTF-8 in perl5. I'm asking you because you are the apparent author of the commits that did this.

To refresh your memory, in perl5 UTF-8, a start byte of 0xFF causes the length of the sequence of bytes that comprise a single character to be 13 bytes. This allows code points up to 2**72 - 1 to be represented. If the length had been instead 12 bytes, code points up to 2**66 - 1 could be represented, which is enough to represent any code point possible in a 64-bit word.

The comments indicate that these extra bits are "reserved". So we're wondering what potential use you had thought of for these bits.

Thanks

Karl Williamson

Reply via email to