On Tue, 21 Mar 2006, Matt Dempsey wrote:
> I'm having a new problem with my House vote script. It's returning the > following error: > > Traceback (most recent call last): > File "C:/Python24/evenmorevotes", line 20, in -toplevel- > f.write > (nm+'*'+pt+'*'+vt+'*'+md['vote-result'][0]+'*'+md['vote-desc'][0]+'*'+'\n') > UnicodeEncodeError: 'ascii' codec can't encode character u'\u201c' in > position 172: ordinal not in range(128) Hi Matt, Just wondering: how familiar are you with Unicode? What's going on is that one of the strings in the string concatenation above contains a Unicode string. It's like an infection: anything that touches Unicode turns Unicode. *grin* ###### >>> 'hello' + u'world' u'helloworld' ###### This has repercussions: when we're writing these strings back to files, because we have a Unicode string, we must now be more explicit about how Unicode is written, since files are really full of bytes, not unicode characters. That is, we need to specify an "encoding". 'utf-8' is a popular encoding that turns Unicode reliably into a bunch of bytes: ###### >>> u'\u201c'.encode('utf8') '\xe2\x80\x9c' ###### and this can be written to a file. Recovering Unicode from bytes can be done by going the other way, by "decoding": ###### >>> '\xe2\x80\x9c'.decode("utf8") u'\u201c' ###### The codecs.open() function in the Standard Library is useful for handling this encode/decode thing so that all we need to do is concentrate on Unicode: http://www.python.org/doc/lib/module-codecs.html#l2h-991 For example: ###### >>> import codecs >>> >>> f = codecs.open("foo.txt", "wb", "utf8") >>> f.write(u'\u201c') >>> f.close() >>> >>> open('foo.txt', 'rb').read() '\xe2\x80\x9c' >>> >>> codecs.open("foo.txt", "rb", "utf-8").read() u'\u201c' ###### We can see that if we read and write to a codec-opened file, it'll transparently do the encoding/decoding step for us as we write() and read() the file. You may also find Joel Spolsky's post on "The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode And Character Sets (No Excuses!) useful in clarifying the basic concepts of Unicode: http://www.joelonsoftware.com/articles/Unicode.html I hope this helps! _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor