Philippe Verdy <verd...@wanadoo.fr> wrote:

 |2012/7/13 Steven Atreju <snatr...@googlemail.com>:
 |> Philippe Verdy <verd...@wanadoo.fr> wrote:
 |>
 |>  |2012/7/12 Steven Atreju <snatr...@googlemail.com>:
 |>  |> UTF-8 is a bytestream, not multioctet(/multisequence).
 |>  |Not even. UTF-8 is a text-stream, not made of arbitrary sequences of
 |>  |bytes. It has a lot of internal semantics and constraints.
 |>  |The effective binary encoding of text streams should NOT play any
 |>  |semantic role (all UTFs should completely be equivalent on the text
 |>  |interface, the bytestream low level is definitely not suitable for
 |>  |handling text and should not play any role in any text parser or
 |>  |collator).
 |>
 |> I don't understand what you are saying here.
 |> UTF-8 is a data interchange format, a text-encoding.
 |> It is not a filetype!
 |
 |Not only ! It is a format which is unambiguously bound to a text
 |filetype, even if this file type may not be intended to be interpreted
 |by humans (e.g. program sources or riche text formats like HTML)
 |
 |> A BOM is a byte-order-mark, used to signal different host endianesses.[...]
 |
 |I'm on this list since long enough to know all this already. And i've
 |not contradicted this role. However this is not prescriptive for

Sure, i know the former and i bet there has been a lot of discussion.

 |anything else than text file types (whatever they are). For example
 |BOMs have abolutely no role for encoding binary images, even if they
 |include internal multibyte numeric fields.

Well, it boils down to that, does it.  If Unicode *defines* that
the so-called BOM is in fact a Unicode-indicating tag that MUST
be present, then it is very clear what has to happen for, say,
'$ cat tagless tagged > out' (in an UTF-8 environment).  I don't
agree with that though due to the reasons i tried to put in
english words, but this is solely my problem.  Another approach
would be an explicit UTF-8-BOM charset.  Or, of course,
deprecating the -BE/-LE versions.

I don't agree with just about anything you say about automatic
metadata provision.  I know that, in Germany, many, many small
libraries become closed because there is not enough money
available to keep up with the digital race, and even the greater
*do* have problems to stay in touch!  I've mentioned bitsavers
already, but this is a drop in the bucket, almost rhetoric.  In
other countries the situation is worse.

  Steven

Reply via email to