> Some further searching reveals this:
> (yay archives ;))
> http://mail.python.org/pipermail/python-list/2008-April/658644.html
>

Aha! I noticed that 150 was missing from the ISO encoding table and
the source xml is indeed using windows-1252 encoding. That explains
why this appears to be the only character in the xml source that
doesn't seem to get translated by Universal Feed Parser. But I'm now
wondering if the feed parser is using windows-1252 rather than some
other encoding.

The below page provides details on how UFP handles character encodings.

http://www.feedparser.org/docs/character-encoding.html

I'm wondering if there's a way to figure out which encoding UFP uses
when it parses the file.

I didn't have the Universal Encoding Detector
(http://chardet.feedparser.org/) installed when I parsed the xml file.
It's not clear to me whether  UFP requires that library to detect the
encoding or if it's an optional part of it's broader routine for
determining encoding.
_______________________________________________
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Reply via email to