John S. Yates, Jr. schrieb:
> It is a mistake on Microsoft's part to fail to strip the BOM
> during conversion to UTF-8. There is no MEANINGFUL definition
> of BOM in a UTF-8 string.
That's not true. See
http://unicode.org/faq/utf_bom.html#23
http://unicode.org/faq/utf_bom.html#29
The BOM can a
Georg Brandl schrieb:
> b = (codecs.BOM_UTF8 + "hello").decode("utf-8")
> len(a)
>> 5
>
> This behavior is questionable...
Indeed. Try
py> b = (codecs.BOM_UTF8 + "hello").decode("utf-8-sig")
py> len(b)
5
instead.
Regards,
Martin
___
Python-30
Josiah Carlson schrieb:
> I'm unable to find that particular utf-8 codec in the version of Python
> 2.5 I have installed, but I may not be looking in the right places, or
> spelling it the right way.
Try "utf-8-sig" as the encoding name, or encodings/utf_8_sig.py for the
implementation.
Regards,
Blake Winton schrieb:
> I don't know if the magic number stuff to determine whether a file is
> executable or not is bash-specific. Either way, when I save the file in
> UTF-8, it's fine, but when I save it in UTF-8 with a BOM, it fails.
It's the operating system that does the interpretation. O
Blake Winton schrieb:
> Um, what more data do we need for this use-case? I'm not going to
> suggest an API, other than it would be nice if I didn't have to manually
> figure out/hard code all the encodings. (It's my belief that I will
> currently have to do that, or at least special-case XML,