Ok, I should say that I managed to "solve" the problem by first reading and translating the data, and then applying Mr. Lundh's strip_html function to the resulting lines.
For future reference (and of course any additional feedback), the working code is here: http://pastebin.com/f309bf607 But of course that's a Band-Aid approach and I'm still interested in understanding the root of the problem. To that end, I've attached the Exception below from the problematic code. > Your try/except is hiding the problem. What happens if you take it > out? what error do you get? > > My guess is that strip_html() is returning unicode and > translate_code() is expecting strings but I'm not sure without seeing > the error. > When I run this code: <<< snip >>> for line in infile: cleanline = translate_code(line) newline = strip_html(cleanline) outfile.write(newline) <<< snip >>> ...I receive the below traceback: Traceback (most recent call last): File "htmlcleanup.py", line 112, in <module> outfile.write(newline) UnicodeEncodeError: 'ascii' codec can't encode character u'\xf1' in position 21: ordinal not in range(128) _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor