Question about Perl5 extended UTF-8 design

Karl Williamson Thu, 05 Nov 2015 08:08:58 -0800

Hi,

Several of us are wondering about the reason for reserving bits for theextended UTF-8 in perl5. I'm asking you because you are the apparentauthor of the commits that did this.

To refresh your memory, in perl5 UTF-8, a start byte of 0xFF causes thelength of the sequence of bytes that comprise a single character to be13 bytes. This allows code points up to 2**72 - 1 to be represented.If the length had been instead 12 bytes, code points up to 2**66 - 1could be represented, which is enough to represent any code pointpossible in a 64-bit word.

The comments indicate that these extra bits are "reserved". So we'rewondering what potential use you had thought of for these bits.


Thanks

Karl Williamson

Question about Perl5 extended UTF-8 design

Reply via email to