On Tue, Oct 17, 2006 at 11:08:32PM +0000, John J Lee wrote: -> On Tue, 17 Oct 2006, Titus Brown wrote: -> [...] -> > --- -> > File "/disk/u/t/dev/twill/twill/other_packages/BeautifulSoup.py", line -> > 1057, in endData -> > currentData = ''.join(self.currentData) -> > UnicodeDecodeError: 'ascii' codec can't decode byte 0xa0 in position 0: ordinal not in range(128) -> > --- -> > -> > and ended up changing the BeautifulSoup code to to do a -> > -> > currentData = ''.join(str(self.currentData)) -> > ^^^ -> > -> > I don't understand unicode well enough to know whether or not this is -> > going to cause huge problems, but it was the only way to get mechanize -> > and BS 3.0 to play nice. -> -> Why?
Without it, I got that error on many pages. It was never clear to me why, but I am guessing that there was a bad interaction between mechanize or HTML and BeautifulSoup's defaults. Basically BS kept on trying to treat things as ascii, even when the encoding *supplied* to BS was latin-1 or something else. After looking through the code, I connected the encoding as specified by mechanize to the BeautifulSoup setup, and it made no difference -- unless I also did the 'str' call. As I have no claim to understanding how mechanize deals with unicode, I'm not sure how it all works. But it *does* seem to work ok. cheers, --titus _______________________________________________ twill mailing list [email protected] http://lists.idyll.org/listinfo/twill
