On Tue, Aug 8, 2017 at 10:17 PM, boB Stepp <robertvst...@gmail.com> wrote: > On Mon, Aug 7, 2017 at 10:01 PM, Ben Finney <ben+pyt...@benfinney.id.au> > wrote: >> boB Stepp <robertvst...@gmail.com> writes: >> >>> How is len() getting these values? >>
> > It is translating the Unicode code points into bits patterned by the > encoding specified. I know this. I was reading some examples from a > book and it was demonstrating the different lengths resulting from > encoding into UTF-8, 16 and 32. I was mildly surprised that len() > even worked on these encoding results. But for the life of me I can't > figure out for UTF-16 and 18 how these lengths are determined. For > instance just looking at a single character: > > py3: h = 'h' > py3: h16 = h.encode("UTF-16") > py3: h16 > b'\xff\xfeh\x00' > py3: len(h16) > 4 This all makes perfect sense, arithmetic-wise, now that Matt has made me realize my hex arithmetic was quite deficient! And that makes sense of the trailing "\x00", too (NOT an EOL char.). -- boB _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor