Re: [ssl-users] BMPString, UniCode and HTML

Gregory S. Stark Thu, 12 Mar 1998 05:46:30 -0500

Mark Shuttleworth <[EMAIL PROTECTED]> writes:
> With BMPString gaining popularity in the MS crypto tools, we're going to
> see more and more usage of it.  I know that BMPStrong maps to Unicode
> (UTF-8?).  But if we want to display the contents of a cert in an HTML
> format, and one of the DN elements is a BMPString, but the rest of the
> cert is Printable's, do we have to generate the whole document in UTF-8?
> 
> Is there any way to do something like <SPAN CHARSET="utf-8">Unicode
> String Goes Here</SPAN>?
> 
> Have driven this small mind quite dizzy trying to figure out character set
> usage over the past week, with no joy, so any and all suggestions
> appreciated!

No, and i don' think it would even make sense if you could. The charset would
in theory affect even the parsing of the html code; in theory `<' could appear
as some other character, or in a multibyte character that the parser wouldn't
recognize if it was still in the original character set. The only HTML
elements that accept a charset attribute are elements representing external
resources. 

What you probably are best off doing is substituting all the characters, or at
least all the characters outside of the current charset, with numeric
entities of the form &#uuu; where uuu is a unicode character value.

greg


>From the HTML 4.0 specification section 5.2.2:

> To sum up, conforming user agents must observe the following priorities when
> determining a document's character encoding (from highest priority to
> lowest):
> 
>   1. An HTTP "charset" parameter in a "Content-Type" field.
>   2. A META declaration with "http-equiv" set to "Content-Type" and a value
>      set for "charset".
>   3. The charset attribute set on an element that designates an external
>      resource.

And about numeric character entities:

> 5.3.1 Numeric character references
> 
> Numeric character references specify the code position of a character in the
> document character set. Numeric character references may take two forms:
> 
>    * The syntax "&#D;", where D is a decimal number, refers to the Unicode
>      decimal character number D.
>    * The syntax "&#xH;" or "&#XH;", where H is an hexadecimal number, refers
>      to the Unicode hexadecimal character number H. Hexadecimal numbers in
>      numeric character references are case-insensitive.
> 
> Here are some examples of numeric character references:
> 
>    * &#229; (in decimal) represents the letter "a" with a small circle above
>      it (used, for example, in Norwegian).
>    * &#xE5; (in hexadecimal) represents the same character.
>    * &#Xe5; (in hexadecimal) represents the same character as well.
>    * &#1048; (in decimal) represents the Cyrillic capital letter "I".
>    * &#x6C34; (in hexadecimal) represents the Chinese character for water.
> 
> Note. Although the hexadecimal representation is not defined in [ISO8879],
> it is expected to be in the revision, as described in [WEBSGML]. This
> convention is particularly useful since character standards generally use
> hexadecimal representations.
> 
> 

+-------------------------------------------------------------------------+
| Administrative requests should be sent to [EMAIL PROTECTED] |
| List service provided by Open Software Associates, http://www.osa.com/  |
+-------------------------------------------------------------------------+
Re: [ssl-users] BMPString, UniCode and HTML

Reply via email to