Actually, it's more likely that the char you are grabbing is UTF-16 not
UTF-8 which is moving into the double byte...
* An assumption based on the following output:

>>> u = u'\u2014'
>>> s = u.encode("utf-16")
>>> print(s)
 ■¶
>>> s = u.encode("utf-32")
>>> print(s)
 ■  ¶
>>> s = u.encode("utf-16LE")
>>> print(s)
¶
>>> s = u.encode("utf-16BE")
>>> print(s)
 ¶

See https://en.wikipedia.org/wiki/Character_encoding to help with the
understanding of character encoding, code pages and why they are important.





James
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Reply via email to