Re: [Tutor] UnicodeDecodeError while parsing a .csv file.

Albert-Jan Roskam Tue, 29 Oct 2013 03:38:14 -0700

-------------------------------------------
On Tue, 10/29/13, eryksun <eryk...@gmail.com> wrote:

 Subject: Re: [Tutor] UnicodeDecodeError while parsing a .csv file.
 To: "Steven D'Aprano" <st...@pearwood.info>
 Cc: tutor@python.org
 Date: Tuesday, October 29, 2013, 3:24 AM

 On Mon, Oct 28, 2013 at 7:49 PM,
 Steven D'Aprano <st...@pearwood.info>
 wrote:
 >
 > By default Python 3 uses UTF-8 when reading files. As
 the error below
 > shows, your file actually isn't UTF-8.

 Modules default to UTF-8, but io.TextIOWrapper defaults to
 the locale
 preferred encoding. To handle terminals, it first tries
 os.device_encoding (i.e. _Py_device_encoding). Otherwise for
 files it
 defaults to locale.getpreferredencoding(False).

==> Why is do_setlocale=False here? Actually, what does this parameter do? It 
seems strange that a getter function has a 'set' argument.

>>> import locale
>>> help(locale.getpreferredencoding)
Help on function getpreferredencoding in module locale:

getpreferredencoding(do_setlocale=True)
    Return the charset that the user is likely using.

Other remark: I have not read this entire thread, but I was thinking the OP 
might use codecs.open to open the file in the correct encoding. If that 
encoding is unknown, maybe chardet could be used to guess it: 
https://pypi.python.org/pypi/chardet. I have never used this module, but it 
seems worth giving a try.

The other day I received a file that was encoded multiple times so accented 
characters were all messed up. I had to reverse engineer this and it turned out 
that a sequence of latin-1 and utf-8 had been used. Would be nice if (1) this 
wouldn't happen in the first place ;-) (2) Some library would help with this 
"de-mojibake" process.

_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] UnicodeDecodeError while parsing a .csv file.

Reply via email to