Re: [whatwg] StringEncoding: Allowed encodings for TextEncoder

Joshua Cranmer Tue, 07 Aug 2012 12:08:18 -0700

On 8/7/2012 12:48 PM, Joshua Bell wrote:

When Anne's spec appeared I gutted mine and deferred wherever possibleto his. One consequence of that was getting the other encodings "forfree" as far as the spec writing goes. If we achieve consensus that weonly want to support UTF encodings we can add the restrictions. Thereare use cases for supporting other encodings (parsing legacy data fileformats, for example), but that could be deferred.

My main use case, and the only one I'm going to argue for, is being ableto handle mail messages with this API, and the primary concern here isdecoding. I'll agree with other sentiments in this thread that I don'tparticularly care about encoding to anything other than UTF-8 (it mightbe nice, but I can live without it); it's being able to decode $CHARSETthat I'm concerned about. As far as edge cases in this scenario areconcerned, it pretty much boils down to "I want to produce the same JSstring that would be output if I looked at the text content of thedocument data:text/plain;charset=<charset>,<data>".

When encoding, I think it is absolutely necessary to enforce a uniformguidelines for the output. When decoding, however, I think that mostdifferences (beyond concerns like the BOM) are a result of "buggy"content creators as opposed to the browser media. Given that HTMLdisplay has apparently tolerated differences in charset decoding forlegacy charsets, I suppose it is possible to live with a difference ofexact character decoding for various charsets--in other words, turningthe charset document into an advisory list of both minimum charsets tosupport and how to do so.


--
Beware of bugs in the above code; I have only proved it correct, not tried it. 
-- Donald E. Knuth

Re: [whatwg] StringEncoding: Allowed encodings for TextEncoder

Reply via email to