This seems to be a problem with BeautifulSoup and Python 2.5. I spent some time looking at it this morning and tracked down one problem. Below is the email I sent to the BeautifulSoup maintainer.
I doubt that either of these problems will actually be a problem in practice. I suggest you install it by copying the .py file to site-packages and go ahead and use it. Kent ========================================================== Hi, BeautifulSoup has a few problems with Python 2.5. Running the tests gives this output: ................................./Users/kent/Desktop/Downloads/BeautifulSoup-3.0.3/BeautifulSoup.py:1654: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal elif data[:3] == '\xef\xbb\xbf': /Users/kent/Desktop/Downloads/BeautifulSoup-3.0.3/BeautifulSoup.py:1657: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal elif data[:4] == '\x00\x00\xfe\xff': /Users/kent/Desktop/Downloads/BeautifulSoup-3.0.3/BeautifulSoup.py:1660: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal elif data[:4] == '\xff\xfe\x00\x00': .......F........... ====================================================================== FAIL: testQuotedAttributeValues (__main__.QuoteMeOnThat) ---------------------------------------------------------------------- Traceback (most recent call last): File "BeautifulSoupTests.py", line 382, in testQuotedAttributeValues '<this is="really messed up & stuff"></this>') File "BeautifulSoupTests.py", line 19, in assertSoupEquals self.assertEqual(str(c(toParse, convertEntities=convertEntities)), rep) AssertionError: '<this is="really messed up & stuff"></this>' != '<this is="really messed up & stuff"></this>' ---------------------------------------------------------------------- Ran 52 tests in 0.208s FAILED (failures=1) The UnicodeWarnings seem to be caused by a change in how Python handles mixed string comparisons. In Python 2.4, the comparison u'' == '\xef\xbb\xbf' raises UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 0: ordinal not in range(128) In Python 2.5, the same comparison prints a warning but doesn't raise an exception. For more information about this change, see the section starting "A new warning, UnicodeWarning," on this page: http://docs.python.org/whatsnew/other-lang.html The affected code is in UnicodeDammit._toUnicode(). When BeautifulSoup() is called with no text data, as happens a few times in the test suite, _toUnicode() is called with an empty unicode string and triggers this warning. One way to fix this is to have UnicodeDammit.__init__() explicitly check for an empty string and just return u"". Here is a suggested rewrite of the initial portion of UnicodeDammit.__init__(): def __init__(self, markup, overrideEncodings=[], smartQuotesTo='xml'): self.markup, documentEncoding, sniffedEncoding = \ self._detectEncoding(markup) self.smartQuotesTo = smartQuotesTo self.triedEncodings = [] if markup=="" or isinstance(markup, unicode): self.originalEncoding = None self.unicode = unicode(markup) return Note that I have also changed the way this works if markup is already unicode; the current implementation is incorrect, it returns a value which is not allowed in __init__(). I don't know enough about the way BeautifulSoup works to figure out the second one... Best regards, Kent _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor