First, let me thank everyone for their wise and experienced comments. This is exactly 
what this sort of list should be for...

For the sake of clarity, let me define two terms:
1. "Unicode" means Unicode.
2. "UNICODE" means "what an end user thinks when he sees the characters U, n, i, c, o, 
d, e on the screen, in that order".

What we are trying to establish is the exact meaning that UNICODE ought to have - that 
is, if it can have one at all.

I suggest that a more technical definition of UNICODE could be "a file format that can 
be read by programs that read UNICODE". This is pretty certain to be what a user 
understands by the word!

Now in the world of application programs intended for real human beings (as opposed, 
for example, to specialised technical tools), I cannot see that any program will 
survive for long if it cannot read, without user intervention, files written in all 
the self-describing Unicode formats (all those with a BOM). It follows that any of 
these formats could, with equal propriety, be described as UNICODE.

Moving back to output formats: this implies that the only requirement for a program 
that outputs data should be that if the user asks it to use UNICODE, the program uses 
one of the self-describing formats. The decision as to *which* of these formats to use 
would be up to the programmer. Depending on the circumstances, he may hard-wire a 
specific choice (perhaps whatever is best for the platform), or he may provide a 
configuration option accessible to more technical users.

Now, a question: 

Are there, in fact, many circumstances in which it is necessary for an end user to 
create files that do *not* have a BOM at the beginning?



Reply via email to