On Tue, 07 Sep 2010 14:56:38 +0200, Boris Zbarsky <bzbar...@mit.edu> wrote:

On 9/7/10 4:11 AM, Philip Jägenstedt wrote:
It's garbage in at least UTF-8, Big5 and GBK.

Thanks. I assume that applies to the OggS\0 sequence too, right? I appreciate the data!

UTF-8, Big5 and GBK are all (as far as I know) ASCII supersets. Do real-world text documents include \0 bytes? (I don't know.)

I'm not sure what infrastructure is in place, but perhaps one could
*not* sniff if Content-Type also indicates an encoding?

As long as "indicates an encoding" doesn't include UTF-8 or ISO-8859-1 (thanks, Apache!), that should be reasonable, I think.

Are you saying that Apache has, at various times, set the default character encoding to UTF-8 or ISO-8859-1? I was hoping that no encoding parameter at all would be sent :/

--
Philip Jägenstedt
Core Developer
Opera Software

Reply via email to