On Wed, Jul 10, 2013 at 3:45 AM, Dave Angel <da...@davea.name> wrote:
> > Get rid of the BOM from the data file, and it'll work fine. You don't > specify what version of Python you're using, so I have to guess. But > there's a utf-8 BOM conversion of a BOM at the beginning of that file, and > that's not numeric. Best would be to change the way you generate that > file, and don't put in a BOM for utf-8. > > BOM's are markers that are put at the beginning of certain encodings of > files to distinguish between BE and LE encodings. But since your file is > utf-8, a BOM is unnecessary and confusing. Just jumping in to translate a bit of jargon... BOM stands for Byte Order Mark. ( http://www.opentag.com/xfaq_enc.htm#enc_bom)<http://www.opentag.com/xfaq_enc.htm#enc_bom> BE stands for "big-endian", and LE stands for "little-endian". Since the first digital computers were built, there have been two schools of thought as to how numbers should be stored: with the "most significant" digits first, or the "least significant" digits first. The two schools are called "big-endian" and "little-endian", after a famous controversy in "Gulliver's Travels". The BOM is a sequence of bytes at the beginning of a Unicode string that tells the reader whether the rest of the string will be big-endian or little-endian. UTF-8 was designed to be endian-agnostic, so a BOM is not actually needed. > It may even be illegal, but I'm not sure about that. > No, it's not illegal; when utf-8 was first introduced it was actually required. It's no longer required - so now even utf-8 comes in two flavors (with and without BOM)!
_______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor