Good point. But there has to be an actual attacker here, as in, a hacker engaged in a purposefully malevalent attempt to (say) run arbitrary code on a victim's machine (the victim being an end-user,  a web-page viewer). To achieve this, the attacker must exploit "features" of the victim's browser.  Yes, I was assuming that the attacker was a document author -- but if the attacker was a server (or at least, a server administrator), then it's difficult to see what a document author can do to guard against this. If the server is an attacker, they could of course modify all documents served anyway, in any manner they chose. In such a circumstance, document authors would be well advised to move their documents to another server ... assuming they ever found out.

The attack is only theoretical, so far as I know, but basically it works like this: the attacker places a link to (say) "C:\WINNT\SYSTEM32\CMD.EXE (plus some nasty parameters)" in a hyperlink and encourages you to click on it. If all is well, the browser should forbid this.  But if the string is written in encoding A, and the browser parses it assuming it to be encoding B, it is possible that the browser may not recognise the path as being absolute, and so may allow it. Of course,  you'd have to try really hard to find encodings A and B such that this becomes feasable, but you never know, it might be doable. Plus, you'd have to find a user dumb enough to be running a sufficiently old browser that it was still prone to this exploit. (I'm pretty sure modern browsers will have closed that hole by now, but again, you never know). But even a buggy and stupid browser will never fall victim to this exploit if the browser is able to infer the correct encoding for the document.

But look at it like this. Suppose a html document had a meta tag which claimed: <META HTTP-EQUIV="Content-length" CONTENT=1>. In this circumstance, which would you prefer to believe: The HTTP Content-length header? Or the meta tag? (One can certainly imagine buffer-overrun exploits if browsers were to make the wrong choice).

Of course, having said that, document authors can affect HTTP headers directly anyway. If the document were to be written in PHP instead of HTML then a document author could generate any HTTP headers they wanted! (I've actually done this to deliver documents in UTF-8 against the server's default). All I can assume is maybe there's some sort of threat model in place which assumes that anyone who can code in PHP can't possibly be an attacker! If so, it's clearly nonsense.

I still maintain, though (in agreement with Jon) that a server should obey the document author by taking notice of meta tags and transforming them into HTTP tags. (At the very least, it should take the meta tag as a hint, and use it as an HTTP tag if the hint turns out to be true). To ignore them altogether is just dumb.

Jill

PS. I haven't mentioned Unicode domain names. That's a different kettle of fish altogether. Maybe we could have another thread for that.

 

> -----Original Message-----
> From: Peter Kirk [mailto:[EMAIL PROTECTED]]
> Sent: Monday, September 29, 2003 5:33 PM
> To: Jill Ramonsky
> Cc: [EMAIL PROTECTED]
> Subject: Re: Fun with proof by analogy, was Re: Mojibake on
> my Web pages
>
>
> I know I don't understand all the issues here, but I think I spot one
> flaw in the argument. This seems to imply that all security holes are
> the work of the content providers and none related to the servers. In
> other words, that all servers and their administrators are entirely
> trustworthy. This is certainly not necessarily true. And if a content
> provider can compromise security by confusing encodings, so
> can a server.
>
> This could become a significant security hole when we get
> Unicode domain
> names. A malicious server administrator could register the mojibake
> equivalent of a legitimate security sensitive domain name and then
> deliberately serve the mojibake version to users, etc etc.
>

Reply via email to