On Wed, Jun 08, 2016 at 09:54:23AM -0400, Alex Hall wrote: > All, > I'm working on a project that writes CSV files, and I have to get it done > very soon. I've done this before, but I'm suddenly hitting a problem with > unicode conversions. I'm trying to write data, but getting the standard > cannot encode character: ordinal not in range(128)
I infer from your error that you are using Python 2. Is that right? You should say so, *especially* for Unicode problems, because Python 3 uses a very different (and much better) system for handling text strings. Also, there is no such thing as a "standard" error. All error messages are different, and they usually show lots of debugging information that you haven't yet learned to read. But we have, so please show us the full traceback! > I've tried > str(info).encode("utf8") > str(info).decode(utf8") One of the problems with Python 2 is that it allows two nonsense operations: str.encode and unicode.decode. The whole string handling thing in Python 2 is a bit of a mess. It's over 20 years old, and dates back to before Unicode even existed, so you'll have to excuse a bit of confusion. In Python 2: (1) str means *byte string*, NOT text string, and is limited to "chars" with ordinal values 0 to 255; (2) unicode means "text string"; (3) In an attempt to be helpful, Python 2 will try to automatically convert to and from bytes strings as needed. This works so long as all your characters are ASCII, but leads to chaos, confusion and error as soon as you have non-ASCII characters involved. Python 3 fixes these confusing features. Remember two facts: (1) To go from TEXT to BYTES (i.e. unicode -> str) use ENCODE; (2) To go from BYTES to TEXT (i.e. str -> unicode) use DECODE. but you must be careful to prevent Python doing those automatic conversions first. Looking at your code: str(info).encode("utf8") that's wrong, because it tries to go from str->unicode using encode. But using decode also gives the same error. That hints that the error is happening in the call to str() first. Firstly, we need to know what info is. Run this: print type(info) print repr(info) print str(info) and report any errors and output. I'm going to assume that info is a unicode object. Why? Because that will give the error you experience: py> info = u'abcµ' py> str(info) Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeEncodeError: 'ascii' codec can't encode character u'\xb5' in position 3: ordinal not in range(128) The right way to convert unicode text to a byte str is with the encode method. Unless you have good reason to use another encoding, always use UTF-8 (which I see you are doing, great). py> info.encode('utf-8') 'abc\xc2\xb5' If all your Unicode text strings are valid and correct, that should be all you need, but if you are paranoid and fear "invalid" Unicode strings, which can theoretically happen (ask me how if you care), you can take a belt-and-braces approach and preemptively deal with errors by converting them to question marks. NOTE THAT THIS THROWS AWAY INFORMATION FROM YOUR UNICODE TEXT. If your paranoia exceeds your fear of losing information, you can instruct Python to use a ? any time there is an encoding error: info.encode('utf-8', errors='replace') So to recap: - you have a variable `info`, which I am guessing is unicode - you can convert it to a byte str with: info.encode('utf-8') or for the paranoid: info.encode('utf-8', errors='replace') Now that you have a byte string, you can just write it out to the CSV file. To read it back in, you read the CSV file, which returns a byte str, and then convert back to Unicode with: info = data.decode('utf-8') > unicode(info, "utf8") When you run this, what exception do you get? My guess is that you get the following TypeError: py> unicode(u'abc', 'utf-8') Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: decoding Unicode is not supported > csvFile = open("myFile.csv", "wb", encoding="utf-8") #invalid keyword > argument Python 3 allows you to set the encoding of files, Python 2 doesn't. In Python 2 you can use the io module, but note that this won't help you as (1) the csv module doesn't support Unicode, and (2) your problem lies elsewhere. P.S. don't feel bad if the whole Unicode thing is confusing you. Most people go through a period of confusion, because you have to unlearn nearly everything you thought you knew about text in computers before you can really get Unicode. -- Steve _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor