Re: Are these characters encoded?

DougEwell2 Sat, 01 Dec 2001 13:57:06 -0800

At 2001-12-01 11:24:04 Pacific Standard Time, 
[EMAIL PROTECTED] (Stefan Persson) wrote:


> I was thinking if this was encoded:
>
> 1.) Swedish ampersand (see "&.bmp"). It's an "o" (for "och", i.e. "and")
> with a line below. In handwritten text it is almost always used instead of
> &, in machine-written text I don't think I've ever seen it.

This might be a character in its own right, as different from the ampersand 
as U+204A TIRONIAN SIGN ET.  Or it might be simply a glyph variant of  the 
ampersand.  If you have never seen o-underbar in machine-written text, I 
doubt that this will help your cause much.  You might try U+006F U+0332, 
though this will probably not give you the vertical spacing you expect.

(As a side note, this "o-underbar" form reminds me of the "c-underbar" which 
is sometimes used in handwritten English to mean "with."  Does anyone know 
the origin of this symbol?  Is it possibly derived from the Latin word cum, 
meaning "with"?  Does it have any claim to being a character in its own 
right?)

> 2.) Fractions with any number, see "brĺk.bmp."

U+2044 FRACTION SLASH is exactly what you are looking for.  Whether your 
browser or other rendering engine will display it the way you want is another 
matter.

On page 154 of TUS 3.0, there is a two-paragraph description of the use of 
U+2044.  Note particularly the sentence:

"The standard form of a fraction built using the fraction slash is defined as 
follows: Any sequence of one or more decimal digits, followed by the fraction 
slash, followed by any sequence of one or more decimal digits."

This would give you the results you expect for "123/456" but not for "x/y" or 
even "14658.48/13789".  However, it is not clear to me that this "standard 
form" is normative, and it is conceivable that a fraction-slash-aware 
renderer could generalize this to "one or more non-space characters, fraction 
slash, one or more non-space characters."

> 3.) Roman numerals. I know I-XII are encoded, but what if you want to use
> higher numbers? Typing "XX," you might suggest.

The set of Roman numerals, at least through 4999, can be completely specified 
with the characters U+2160 "I", U+2164 "V", U+2169 "X", U+216C "L", U+216D 
"C", U+216E "D", and U+216F "M" (or, of course, with the equivalent Latin 
letters).  According to TUS 3.0, page 299, "Upper- and lowercase variants of 
the Roman numerals through 12, plus L, C, D, and M, have been encoded for 
compatibility with East Asian standards."  Requests for additional 
precomposed Roman numerals will almost certainly be denied.

> This is not always
> sufficient; in Sweden we often put a line under and one above the numbers,
> see "Roma.bmp."

Sounds like a glyph-variant issue.  Font designers might want to ensure that 
the glyphs for the Roman numeral forms do have the over- and underlines.  
Then, if a user doesn't want them, she can always use the plain Latin letters 
instead.

> And what about ten thousands? Neither "XŻ" nor "XŻ" are
> displayed properly!

They should be; that's what the combining characters are there for.  (Hint: 
you want U+0305 COMBINING OVERLINE, not U+0304 COMBINING MACRON.)

To be fair to Stefan, most rendering engines have a long way to go to catch 
up with the Unicode ideal of being able to attach arbitrary combining marks 
(like U+0305) to arbitrary base characters (like U+2169).  Many renderers 
simply replace the sequence with a precomposed glyph.  This approach looks 
really sharp IF such a glyph is available, but breaks down otherwise.

-Doug Ewell
 Fullerton, California

Re: Are these characters encoded?

Reply via email to