Re: [Tutor] name shortening in a csv module output

Alan Gauld Thu, 23 Apr 2015 14:12:18 -0700

On 23/04/15 19:14, Jim Mooney wrote:


By relying on the default when you read it, you're making an unspoken
assumption about the encoding of the file.


So is there any way to sniff the encoding, including the BOM (which appears
to be used or not used randomly for utf-8), so you can then use the proper
encoding, or do you wander in the wilderness?


Pretty much guesswork.

The move from plain old ASCII to Unicode (and others) has made thehandling of text much more like binary. You have to know the binaryformat/encoding to know how to decode binary data. Its the same withtext, if you don't know what produced it, and in what format, then youhave to guess.

There are some things you can do to check your results (such as tryspell checking the results) and you can try checking the charactersagainst the Unicode mappings to see if the sequences look sane.

(for example a lot of mixed alphabets - like arabic, greek and
latin - suggests you guessed wrong!) But none of it is really
reliable.

--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] name shortening in a csv module output

Reply via email to