"Doug Ewell" <d...@ewellic.org> wrote: |As a programmer, I can attest that we are no more receptive to being |called "duds" than any other professionals. Constructive suggestions |focused on the end product, instead of the competence of the person, |might get a response.
You're of course right. The tone was rude. |Steven Atreju wrote: | |> Well, i still see a bug in the Unicode Standard here. |> Whereas for the multioctet UTFs there is «The BOM is not |> considered part of the content of the text» (Conformance, 3.10, |> D98, D101), i cannot find any such clarifying text for it's usage |> as a signature. | |There really isn't as much difference between using U+FEFF "as a byte |order mark" and using it "as a signature" as this makes it seem. The |definitions you quote have to do with whether U+FEFF is treated as a |BOM/signature or as a zero-width no-break space. I don't understand what you are saying here. And i do think that more people are uncertain about wether this has been left off intentionally (which i personally would assume given the assumed grade of the people involved and the amount of time that the standard exists and has been reviewed). I really think that a clarification in equal spirit to those of D98 and D101 (but maybe with different content :) would be an improvement of the Unicode Standard. Once more i want to point out that on Unix/POSIX systems the file content can be seen as a whole, and i hope and think that this will not change. This situation is completely different than on Windows, which had textfiles with appended (separated by ^Z or so) meta information that was invisible in normal text editors already in the ninetees (or even earlier, but i don't know). I.e., this is why we do have this messy text OR binary file I/O distinction like O_BINARY (for open(2)), "b" (for fopen(3)) or binmode (perl(1)). Because without those a text file will see End-Of-File at the ^Z, not at the real end of the file. (Which rises the immediate question why the Microsoft programmers did not embed the meta information in this section at the end of the file. But i don't really want to know.) Anyway. On Unix a UTF-8 file *will* show the BOM, because it is file content. I.e.: |?0%0[tmp]$ hexdump -C text |00000000 ef bb bf 49 20 64 6f 6e 27 74 20 77 61 6e 74 20 |...I don't want | |00000010 74 6f 20 73 65 65 20 79 6f 75 2c 20 65 76 65 72 |to see you, ever| |00000020 21 0a 53 68 65 20 70 75 74 20 6f 6e 20 68 65 72 |!.She put on her| |00000030 20 63 6f 61 74 20 61 6e 64 20 6c 65 66 74 2e 0a | coat and left..| |00000040 is shown (because even bad english is displayed) as |?0%0[tmp]$ v text |<U+FEFF>I don't want to see you, ever! |She put on her coat and left. in an UTF-8 locale and |?0%0[tmp]$ LESSCHARSET=ascii v text |<EF><BB><BF>I don't want to see you, ever! |She put on her coat and left. otherwise. And i like that, because it is the truth. But it of course implies that it will show up exactly like this wherever the signature occurs. |> No, the real issue is that the programmers are duds. |> Or they were unsure about it all... |> Anyway, i've told them they were duds, and as i didn't get any |> response sofar, i was right. | |As a programmer, I can attest that we are no more receptive to being |called "duds" than any other professionals. Constructive suggestions |focused on the end product, instead of the competence of the person, |might get a response. So i apologize again. I want to state however that the company in question is heavily automatized and full of robots. People have to face Modern Times. At least in the manufacturing. (Why do i own a bicycle of them? Because people get jobs there, which they would not have otherwise, *there*. But real craftsmanship products, like those from http://www.manufactum.de, or old Rolls Royce or whatever, are of course preferable.) So do the programmers have to face the same conditions? I don't really think so. They prefer driving plain text readers up the wall. Successfully. |-- |Doug Ewell | Thornton, Colorado, USA |http://www.ewellic.org | @DougEwell Steven