Ok, I should say that I managed to "solve" the problem by first
reading and translating the data, and then applying Mr. Lundh's
strip_html function to the resulting lines.

For future reference (and of course any additional feedback), the
working code is here:

http://pastebin.com/f309bf607

But of course that's a Band-Aid approach and I'm still interested in
understanding the root of the problem. To that end, I've attached the
Exception below from the problematic code.

> Your try/except is hiding the problem. What happens if you take it
> out? what error do you get?
>
> My guess is that strip_html() is returning unicode and
> translate_code() is expecting strings but I'm not sure without seeing
> the error.
>

When I run this code:

<<< snip >>>
for line in infile:
    cleanline = translate_code(line)
    newline = strip_html(cleanline)
    outfile.write(newline)
<<< snip >>>

...I receive the below traceback:

   Traceback (most recent call last):
      File "htmlcleanup.py", line 112, in <module>
      outfile.write(newline)
   UnicodeEncodeError: 'ascii' codec can't encode character u'\xf1' in
position 21: ordinal not in range(128)
_______________________________________________
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Reply via email to