Re: [Tutor] How does len() compute length of a string in UTF-8, 16, and 32?

boB Stepp Tue, 08 Aug 2017 20:56:39 -0700

On Tue, Aug 8, 2017 at 10:17 PM, boB Stepp <[email protected]> wrote:
> On Mon, Aug 7, 2017 at 10:01 PM, Ben Finney <[email protected]> 
> wrote:
>> boB Stepp <[email protected]> writes:
>>
>>> How is len() getting these values?
>>


>
> It is translating the Unicode code points into bits patterned by the
> encoding specified.  I know this.  I was reading some examples from a
> book and it was demonstrating the different lengths resulting from
> encoding into UTF-8, 16 and 32.  I was mildly surprised that len()
> even worked on these encoding results.  But for the life of me I can't
> figure out for UTF-16 and 18 how these lengths are determined.  For
> instance just looking at a single character:
>
> py3: h = 'h'
> py3: h16 = h.encode("UTF-16")
> py3: h16
> b'\xff\xfeh\x00'
> py3: len(h16)
> 4

This all makes perfect sense, arithmetic-wise, now that Matt has made
me realize my hex arithmetic was quite deficient!  And that makes
sense of the trailing "\x00", too (NOT an EOL char.).


-- 
boB
_______________________________________________
Tutor maillist  -  [email protected]
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] How does len() compute length of a string in UTF-8, 16, and 32?

Reply via email to