On Tue, Aug 8, 2017 at 10:17 PM, boB Stepp <robertvst...@gmail.com> wrote:
> On Mon, Aug 7, 2017 at 10:01 PM, Ben Finney <ben+pyt...@benfinney.id.au> 
> wrote:
>> boB Stepp <robertvst...@gmail.com> writes:
>>
>>> How is len() getting these values?
>>

>
> It is translating the Unicode code points into bits patterned by the
> encoding specified.  I know this.  I was reading some examples from a
> book and it was demonstrating the different lengths resulting from
> encoding into UTF-8, 16 and 32.  I was mildly surprised that len()
> even worked on these encoding results.  But for the life of me I can't
> figure out for UTF-16 and 18 how these lengths are determined.  For
> instance just looking at a single character:
>
> py3: h = 'h'
> py3: h16 = h.encode("UTF-16")
> py3: h16
> b'\xff\xfeh\x00'
> py3: len(h16)
> 4

This all makes perfect sense, arithmetic-wise, now that Matt has made
me realize my hex arithmetic was quite deficient!  And that makes
sense of the trailing "\x00", too (NOT an EOL char.).


-- 
boB
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Reply via email to