-----BEGIN PGP SIGNED MESSAGE----- Lars Kristan wrote: > Doug Ewell wrote: > > fine (as are LF->CRLF, stripped BOM's, and maybe even some edge cases > > like converting between tabs and spaces). If there are any > > security or spoofing concerns, it's best to leave everything completely > > untouched. > > I see this as a good reason for NOT using BOM in UTF-8 files. CRLF is a > major nuisance that many Windows programmers need to deal with. It requires > text vs. binary mode when opening the files, plus size of the file does not > match the number of characters written or read. UNIX programs usually don't > need to bother with all that.
Text files in a known charset should always be opened in binary mode (that is, what the C stdio API refers to as binary mode). The sets of valid character sequences that must be accepted or generated for newline are defined by the file format, *not* by the platform. When designing a new file format, see UAX#13. - -- David Hopwood <[EMAIL PROTECTED]> Home page & PGP public key: http://www.users.zetnet.co.uk/hopwood/ RSA 2048-bit; fingerprint 71 8E A6 23 0E D3 4C E5 0F 69 8C D4 FA 66 15 01 Nothing in this message is intended to be legally binding. If I revoke a public key but refuse to specify why, it is because the private key has been seized under the Regulation of Investigatory Powers Act; see www.fipr.org/rip -----BEGIN PGP SIGNATURE----- Version: 2.6.3i Charset: noconv iQEVAwUBPHB9SzkCAxeYt5gVAQEWVQf+JTx46Df4saTu9p3S/gjr+WOAf+h/cV1t FyLy0SQA+timqut9POdkpJsF/d+w6YO3wYj/qdUvfLOO7ftBGmQpKZ6ibZ/yR5D1 JpF7F3HENsRSKeOTN68jU6vbb4f/qXoKWP5dEoy1tIfLbb5RJ5pSJA5jvDfN35aO qfguwm3qfj2HnjTx1/PNIN1BdD9N2z2yl/Hg+kqGOlgPSUwKnH84JbxTupK87S4B sI+x4QLSZG9sV8qaNpNOprzCVmsPinVLoXzUbmieExFFyBuj9avBoke+S04zPGKy Fd/B5ycUM6YCFxLI9iu30E7OxcPDIomTxnnL15kuvh2WGZRZ3Itp/Q== =+69v -----END PGP SIGNATURE-----