On Jun 26, 2017 4:05 AM, "Rowan Worth" <row...@dug.com> wrote:
On 26 June 2017 at 16:55, Scott Robison <sc...@casaderobison.com> wrote: > Byte Order Mark isn't perfectly descriptive when used with UTF-8. Neither > is dialing a cell phone. Language evolves. > It's not descriptive in the slightest because UTF-8's byte order is *specified by the encoding*. I fear you may not have read my entire email or at least have missed my point. I'm not advocating one way or > another, but if a system strips U+FEFF from a text stream after using it to > determine the encoding, surely it is reasonable to expect that for all > supported encodings. > ?? Are you going to strip 0xFE 0xFF from the front of my iso8859-1 encoded stream and drop my beautiful smiley? þÿ Different encodings demand different treatment. BOM is an artifact of 16/32-bit unicode encodings and can kindly keep its nose out of [the relatively elegant] UTF-8. One, I'm not going to do anything. Two, clearly I'm taking about the three byte UTF-8 sequence that decodes to U+FEFF. Three, you are correct about different encodings. I was trying to move the discussion past the idea of byte order when what we're really talking about is encoding detection. ZWNBSP was used for encoding detection because it had a convenient property that allowed differentiation between multiple encodings and could be safely ignored. The fact that the Unicode folks renamed it BOM instead of TEI or BEM or whatever doesn't mean it can't be used with other unicode transformations. It is neither required, recommended, nor forbidden with UTF-8, it's up to systems exchanging data to decide how to deal with it. _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users