Philippe Verdy <verd...@wanadoo.fr> wrote: |2012/7/13 Steven Atreju <snatr...@googlemail.com>: |> Philippe Verdy <verd...@wanadoo.fr> wrote: |> |> |2012/7/12 Steven Atreju <snatr...@googlemail.com>: |> |> UTF-8 is a bytestream, not multioctet(/multisequence). |> |Not even. UTF-8 is a text-stream, not made of arbitrary sequences of |> |bytes. It has a lot of internal semantics and constraints. |> |The effective binary encoding of text streams should NOT play any |> |semantic role (all UTFs should completely be equivalent on the text |> |interface, the bytestream low level is definitely not suitable for |> |handling text and should not play any role in any text parser or |> |collator). |> |> I don't understand what you are saying here. |> UTF-8 is a data interchange format, a text-encoding. |> It is not a filetype! | |Not only ! It is a format which is unambiguously bound to a text |filetype, even if this file type may not be intended to be interpreted |by humans (e.g. program sources or riche text formats like HTML) | |> A BOM is a byte-order-mark, used to signal different host endianesses.[...] | |I'm on this list since long enough to know all this already. And i've |not contradicted this role. However this is not prescriptive for
Sure, i know the former and i bet there has been a lot of discussion. |anything else than text file types (whatever they are). For example |BOMs have abolutely no role for encoding binary images, even if they |include internal multibyte numeric fields. Well, it boils down to that, does it. If Unicode *defines* that the so-called BOM is in fact a Unicode-indicating tag that MUST be present, then it is very clear what has to happen for, say, '$ cat tagless tagged > out' (in an UTF-8 environment). I don't agree with that though due to the reasons i tried to put in english words, but this is solely my problem. Another approach would be an explicit UTF-8-BOM charset. Or, of course, deprecating the -BE/-LE versions. I don't agree with just about anything you say about automatic metadata provision. I know that, in Germany, many, many small libraries become closed because there is not enough money available to keep up with the digital race, and even the greater *do* have problems to stay in touch! I've mentioned bitsavers already, but this is a drop in the bucket, almost rhetoric. In other countries the situation is worse. Steven