Re: [Tutor] converting string to text

Marc Tompkins Wed, 10 Jul 2013 10:23:21 -0700

On Wed, Jul 10, 2013 at 3:45 AM, Dave Angel <da...@davea.name> wrote:


>
> Get rid of the BOM from the data file, and it'll work fine.  You don't
> specify what version of Python you're using, so I have to guess.  But
> there's a utf-8 BOM conversion of a BOM at the  beginning of that file, and
> that's not numeric.  Best would be to change the way you generate that
> file, and don't put in a BOM for utf-8.
>
> BOM's are markers that are put at the beginning of certain encodings of
> files to distinguish between BE and LE encodings.  But since your file is
> utf-8, a BOM is unnecessary and confusing.


Just jumping in to translate a bit of jargon...

BOM stands for Byte Order Mark.  (
http://www.opentag.com/xfaq_enc.htm#enc_bom)<http://www.opentag.com/xfaq_enc.htm#enc_bom>
BE stands for "big-endian", and LE stands for "little-endian".

Since the first digital computers were built, there have been two schools
of thought as to how numbers should be stored:  with the "most significant"
digits first, or the "least significant" digits first.  The two schools are
called "big-endian" and "little-endian", after a famous controversy in
"Gulliver's Travels".  The BOM is a sequence of bytes at the beginning of a
Unicode string that tells the reader whether the rest of the string will be
big-endian or little-endian.  UTF-8 was designed to be endian-agnostic, so
a BOM is not actually needed.



> It may even be illegal, but I'm not sure about that.
>

No, it's not illegal; when utf-8 was first introduced it was actually
required.  It's no longer required - so now even utf-8 comes in two flavors
(with and without BOM)!

_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] converting string to text

Reply via email to