On Tue, Dec 23, 2008 at 2:10 AM, Eric Abrahamsen <e...@ericabrahamsen.net> wrote: > Hi there, > > I'm configuring a python command to be used by emacs to filter a buffer > through python markdown, and noticed something strange. If I run this > command in the terminal: > > python -c "import sys,markdown; print > markdown.markdown(sys.stdin.read().decode('utf-8'))" < markdown_source.md > > The file (which is encoded as utf-8 and contains Chinese characters) is > converted and output correctly to the terminal. But if I do this to write > the output to a file: > > python -c "import sys,markdown; print > markdown.markdown(sys.stdin.read().decode('utf-8'))" < markdown_source.md > > output.hml > > I get a UnicodeEncodeError, 'ascii' codec can't encode character u'\u2014'. > I'm not sure where exactly this is going wrong, as print and > sys.stdout.write() and whatnot don't provide encoding parameters. What's the > difference between this command writing to the terminal, and writing to the > file?
sys.stdout does have an encoding: In [1]: import sys In [2]: sys.stdout.encoding Out[2]: 'UTF-8' I think print converts to the encoding of stdout, e.g. In [3]: print u'\u2014' — Probably when you pipe the output, sys.stdout.encoding is ascii so the conversion fails. The simple solution is to convert explicitly: print markdown.markdown(sys.stdin.read().decode('utf-8')).encode('utf-8') Kent _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor