Re: [Python-3000] Pre-PEP: Easy Text File Decoding

2006-10-02 Thread Martin v. Löwis
John S. Yates, Jr. schrieb: > It is a mistake on Microsoft's part to fail to strip the BOM > during conversion to UTF-8. There is no MEANINGFUL definition > of BOM in a UTF-8 string. That's not true. See http://unicode.org/faq/utf_bom.html#23 http://unicode.org/faq/utf_bom.html#29 The BOM can a

Re: [Python-3000] BOM handling

2006-10-02 Thread Martin v. Löwis
Georg Brandl schrieb: > b = (codecs.BOM_UTF8 + "hello").decode("utf-8") > len(a) >> 5 > > This behavior is questionable... Indeed. Try py> b = (codecs.BOM_UTF8 + "hello").decode("utf-8-sig") py> len(b) 5 instead. Regards, Martin ___ Python-30

Re: [Python-3000] BOM handling

2006-10-02 Thread Martin v. Löwis
Josiah Carlson schrieb: > I'm unable to find that particular utf-8 codec in the version of Python > 2.5 I have installed, but I may not be looking in the right places, or > spelling it the right way. Try "utf-8-sig" as the encoding name, or encodings/utf_8_sig.py for the implementation. Regards,

Re: [Python-3000] BOM handling

2006-10-02 Thread Martin v. Löwis
Blake Winton schrieb: > I don't know if the magic number stuff to determine whether a file is > executable or not is bash-specific. Either way, when I save the file in > UTF-8, it's fine, but when I save it in UTF-8 with a BOM, it fails. It's the operating system that does the interpretation. O

Re: [Python-3000] BOM handling

2006-10-02 Thread Martin v. Löwis
Blake Winton schrieb: > Um, what more data do we need for this use-case? I'm not going to > suggest an API, other than it would be nice if I didn't have to manually > figure out/hard code all the encodings. (It's my belief that I will > currently have to do that, or at least special-case XML,