The plot thickens. I've been looking at the code, and this happens while decoding an HTML entity tag (such as á). Apparently the problem is not that the contents of the tweet are being decoded with the wrong character encoding, but that the tweet contains entity tags, and Python's htmllib is failing at converting those.
The line that fails is: self.savedata = self.savedata + data Where savedata is the content of the tweet so far, and data is the character that corresponds to the entity tag, for instance an ë. I wonder why Python feels the need to perform a UTF-8 conversion to perform that concatenation. Any Python experts care to comment? -- UnicodeDecodeError: 'utf8' codec can't decode byte 0xe1 in position 0: unexpected end of data https://bugs.launchpad.net/bugs/605543 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs