On Tue, 07 Sep 2010 14:56:38 +0200, Boris Zbarsky <bzbar...@mit.edu> wrote:
On 9/7/10 4:11 AM, Philip Jägenstedt wrote:
It's garbage in at least UTF-8, Big5 and GBK.
Thanks. I assume that applies to the OggS\0 sequence too, right? I
appreciate the data!
UTF-8, Big5 and GBK are all (as far as I know) ASCII supersets. Do
real-world text documents include \0 bytes? (I don't know.)
I'm not sure what infrastructure is in place, but perhaps one could
*not* sniff if Content-Type also indicates an encoding?
As long as "indicates an encoding" doesn't include UTF-8 or ISO-8859-1
(thanks, Apache!), that should be reasonable, I think.
Are you saying that Apache has, at various times, set the default
character encoding to UTF-8 or ISO-8859-1? I was hoping that no encoding
parameter at all would be sent :/
--
Philip Jägenstedt
Core Developer
Opera Software