The plot thickens. I've been looking at the code, and this happens while
decoding an HTML entity tag (such as á). Apparently the problem
is not that the contents of the tweet are being decoded with the wrong
character encoding, but that the tweet contains entity tags, and
Python's htmllib is failing at converting those.

The line that fails is:

self.savedata = self.savedata + data

Where savedata is the content of the tweet so far, and data is the
character that corresponds to the entity tag, for instance an ë. I
wonder why Python feels the need to perform a UTF-8 conversion to
perform that concatenation. Any Python experts care to comment?

-- 
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe1 in position 0: 
unexpected end of data
https://bugs.launchpad.net/bugs/605543
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to