On 9/10/07, John Cowan <[EMAIL PROTECTED]> wrote:
> Pierpaolo Bernardi scripsit:
> > which says that you can put a bom in a utf8 file (of course, you can
> > put whatever character you want in a file), but it is a character
> > like every other character, it has no particular meaning wrt the encod
Zbigniew scripsit:
> BOM breaks the UNIX shebang mechanism. To me, this is good enough
> reason to avoid prepending a BOM to scripts, and to detect encoding
> via heuristic, user directive or current locale.
I agree w/r/t scripts.
My point is not that it's a Good Thing to generate 8-BOMs, but th
BOM breaks the UNIX shebang mechanism. To me, this is good enough
reason to avoid prepending a BOM to scripts, and to detect encoding
via heuristic, user directive or current locale.
On 9/9/07, John Cowan <[EMAIL PROTECTED]> wrote:
> Shawn Rutledge scripsit:
> > It would be nice if Chicken was to
Shawn Rutledge scripsit:
> Instead, you think Scite should assume that when it sees any bytes
> with the MSB set, the file is UTF-8? Or there is a better way to
> detect it?
There is no *guaranteed correct* way to detect UTF-8, because a Latin-1
(or various other 8859-x encodings) file can conta
everything ive seen on the unicode site itself seems to discourage the use of
a BOM outside of protocol ambiguous cases since its not a necessary object.
its not an easy thing to be tolerant of in code text, although it is
relatively easy to be tolerant of it in plain text. possibilties: is it
Pierpaolo Bernardi scripsit:
> See here for example: http://unicode.org/faq/utf_bom.html#29
>
> which says that you can put a bom in a utf8 file (of course, you can
> put whatever character you want in a file), but it is a character
> like every other character, it has no particular meaning wrt t
On 9/8/07, Elf <[EMAIL PROTECTED]> wrote:
> and does not state anything about byte order.[1] Quite a lot of
> Windows software (including Windows Notepad) adds one to UTF-8 files.
> However in Unix-like systems (which make heavy use of text files for
> configuration) this practice i
and according to the unicode consortium:
A: Yes, UTF-8 can contain a BOM. However, it makes no difference as
to the endianness of the byte stream. UTF-8 always has the same
byte order. An initial BOM is only used as a signature -- an
indication that an otherwise unmarked text fil
from that page:
While UTF-8 does not have byte order issues, a BOM encoded in UTF-8
may be used to mark text as UTF-8. It only identifies a file as UTF-8
and does not state anything about byte order.[1] Quite a lot of
Windows software (including Windows Notepad) adds one to UTF-8 files.
On 9/9/07, Graham Fawcett <[EMAIL PROTECTED]> wrote:
> On 9/8/07, Pierpaolo Bernardi <[EMAIL PROTECTED]> wrote:
> > UTF8 has no BOM. A BOM in a utf8 file should be there only if you
> > put it there.
>
> Not true.
>
> http://en.wikipedia.org/wiki/Byte_Order_Mark
UTF8 is defined by the Unicode con
On 9/8/07, Pierpaolo Bernardi <[EMAIL PROTECTED]> wrote:
> UTF8 has no BOM. A BOM in a utf8 file should be there only if you
> put it there.
Not true.
http://en.wikipedia.org/wiki/Byte_Order_Mark
G
___
Chicken-users mailing list
Chicken-users@nongnu
On 9/9/07, Shawn Rutledge <[EMAIL PROTECTED]> wrote:
> If I save a Scheme source file from Scite (my usual editor) in UTF8
> mode, it writes the Byte Order Marker at the beginning (EF BB BF).
UTF8 has no BOM. A BOM in a utf8 file should be there only if you
put it there.
It's a bug in Scite.
P.
If I save a Scheme source file from Scite (my usual editor) in UTF8
mode, it writes the Byte Order Marker at the beginning (EF BB BF). If
I load it like this
csi myfile.scm
I get
Error: unbound variable: ||
But if I delete the BOM using a hex editor and try again, csi seems to
assume it's UTF8
13 matches
Mail list logo