>INCOMING TEXT: Trivial to simply chek. I say (once again) its THERE BYTES. If hey are there then there is a BOM. Simple.
Yes, it's trivial to check. What's missing is the notation to tell the checker what to check for. >> The inability to update to one standard all possible consuming >> software one might encounter (or for that matter human customers' opinions) is precisely >> why producing and checking software has to handle both possibilities. >But the "both possibilities" are trivial adn its by no means dificult to do. Having a good program that refuses to do a little work to handle three bytes is like someone who runs a 100 mile marathon and then refuses to cross the finish line because the line is yellor instead of white. Yes, this is a good description of the sad state of existing software. Noting that failure to standardize is irritating and unnecessary doesn't make existing software go away. -----Original Message----- From: Michael (michka) Kaplan [mailto:michka@;trigeminal.com] Sent: Monday, November 04, 2002 8:08 AM To: Joseph Boyle; Unicode Mailing List Subject: Re: PRODUCING and DESCRIBING UTF-8 with and without BOM From: "Joseph Boyle" <[EMAIL PROTECTED]> Joesph, > Software currently under development could use the identifiers for choosing > whether to require or emit BOM, like the file requirements checker I > have to > write, and ICU/uconv. Lets separate that into the two issuse it represents: EMITTING: They could simply choose globally whether to emit the BOM or not. If they wanted to get "fancy" they could have a command line option which said whether to emit the bytes or not. But that is optional. INCOMING TEXT: Trivial to simply chek. I say (once again) its THERE BYTES. If hey are there then there is a BOM. Simple. > The inability to update to one standard all possible consuming > software one > might encounter (or for that matter human customers' opinions) is precisely > why producing and checking software has to handle both possibilities. But the "both possibilities" are trivial adn its by no means dificult to do. Having a good program that refuses to do a little work to handle three bytes is like someone who runs a 100 mile marathon and then refuses to cross the finish line because the line is yellor instead of white. > What would you mean by "the right thing" as far as emitting BOM? > Should file > conversion programs only allow output of non-BOM? (or with-BOM?) Or > should they take the specification in an argument separate from the > charset name? As said before this unnecessarily requires extra logic. Already answered --- they can make a global decision, like notepad or other programs do. Especially if the progammer finds the idea of setting it as a huge hardship, they can skip that work and simply choose whether they want it or not.... I plead with you -- keep it SIMPLE. :-) MichKa