> > (I have the slight impression that it should be something like
> > "status=%4054" or some other very right value, but, again, just one
> > character, not three.)
> 
> Correcting myself:
> 
> status=あ
> 
> http://www.danshort.com/HTMLentities/index.php?w=hirag

NO! The original poster is correct -- you encode the Unicode point as UTF-8,
then send the bytes. From RFC 3986:

   When a new URI scheme defines a component that represents textual
   data consisting of characters from the Universal Character Set [UCS],
   the data should first be encoded as octets according to the UTF-8
   character encoding [STD63]; then only those octets that do not
   correspond to characters in the unreserved set should be percent-
   encoded.  For example, the character A would be represented as "A",
   the character LATIN CAPITAL LETTER A WITH GRAVE would be represented
   as "%C3%80", and the character KATAKANA LETTER A would be represented
   as "%E3%82%A2".

-- 
------------------------------------ personal: http://www.cameronkaiser.com/ --
  Cameron Kaiser * Floodgap Systems * www.floodgap.com * ckai...@floodgap.com
-- He is rising from affluence to poverty. -- Mark Twain ----------------------

Reply via email to