Re: Unicode in a URL

Martin Duerst Thu, 26 Apr 2001 20:00:02 -0700
Hello Paul,

At 19:41 01/04/25 -0700, Paul Deuter wrote:
>I am struggling to figure out the correct method for encoding Unicode
>characters in the
>query string portion of a URL.
>
>There is a W3C spec that says the Unicode character should be converted to
>UTF-8 and
>then each byte should be encoded as %XX.

It also says that form data should be encoded in the encoding of
the page where you fill in the form.


> From my experience however,
>browsers will
>encode all character sets this way and IIS at least will interpret such hex
>bytes according
>to the character set that is set on the receiving page.

Well, communications takes two ends. Each server, for each
URI, has to decide what to do. If you want to make use of
the UTF-8 convention, you have to set your server side
accordingly.


>With IIS 5.0, I have stumbled onto the solution of using %uXXXX where XXXX
>is the
>hexadecimal value of the Unicode character.  When I pass Unicode data
>formatted this way on
>Windows 2000/IIS5 - the data always seems to be decoded properly.
>(Apparently this
>format came from ECMAScript.)

This was a one-time ECMAScript solution. The ECMAScript standard now
has functions to support the UTF-8 convention.

The reason the %uXXXX was discontinued was that it's outside the
URI syntax, and therefore can break all kinds of things.


Regards,    Martin.
Re: Unicode in a URL

Reply via email to