Actually, it's more likely that the char you are grabbing is UTF-16 not UTF-8 which is moving into the double byte... * An assumption based on the following output:
>>> u = u'\u2014' >>> s = u.encode("utf-16") >>> print(s) ■¶ >>> s = u.encode("utf-32") >>> print(s) ■ ¶ >>> s = u.encode("utf-16LE") >>> print(s) ¶ >>> s = u.encode("utf-16BE") >>> print(s) ¶ See https://en.wikipedia.org/wiki/Character_encoding to help with the understanding of character encoding, code pages and why they are important. James _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor