At 10:38 AM 12/19/01 +0000, Kevin Bracey wrote:
>In message <[EMAIL PROTECTED]>
>           Asmus Freytag <[EMAIL PROTECTED]> wrote:
>
> > On top of that, it looks like 950 maps a bogus symbol or punctuation
> > character to U+2574. (2574 is one of a set of 4, and only 1 is mapped for
> > starters. Fonts covering CP950 give a way different image for that
> > character than you'd expect from either the charts or the names...
>
>I recently had to sort out our systems' Big5<->Unicode mapping table, and
>there seems to be great confusion in the punctuation space. The table (that
>used to be) on the Unicode site was unsatisfactory, and Microsoft's CP950
>mapping also doesn't seem to make sense (eg with that U+2574 mapping, and
>CIRCLED PLUS and DOT OPERATOR instead of EARTH and SUN).

The new JIS X 0213 (as mapped in a mapping table somebody sent out for 
comments a while back) contains CIRCLED PLUS, CIRCLED MINUS and CIRCLED 
TIMES, deciding at leas the '+' form in favor of the mathematical 
operators.   (It does not contain a circled dot). Incidentally GBK (as 
mapped for MS 936) has both the DOT OPERATOR and SUN (the only one of the 
mapping tables I found to have mapped to U+2609 SUN). Because of this, you 
get better interoperation among CJK code sets with using CIRCLED PLUS 
instead of EARTH, but at the cost of having obscured the semantics (i.e. 
compromised interoperation with Unicode-based systems).

>One point of note is that there are a whole cluster of characters in the
>compatibility area of Unicode from U+FE30 to U+FE6B that are designed to
>handle mapping CNS11643, whose punctuation area is almost identical to
>Big5's. Mapping tables I've seen don't make proper use of them.

I tend to agree.

>I was able to come up with a good Big5 mapping by taking the best ideas from
>various Big5 and CNS11643 tables on the net, then making sure each of those
>Unicode compatibility characters was used once, AND IN THE ORDER THEY APPEAR
>IN UNICODE.

That's not always a good idea. Unicode order often does not follow any 
standard, even when characters are intended to map. The reason can range 
from transcription mistakes to attempts at presenting a more orderly 
arrangements, with effects of piecemeal additions added on top to confuse 
all. However, if both Unicode and the other standard group related symbols, 
I would try to find mapping targets nearby rather than far away for 
characters of the same group.

>This ends up mapping A15A to U+FE58 SMALL EM DASH, which still
>might not be right, but it looks like a confused character anyway - it
>appears different in Big5 and CNS11643 tables, so it could just be a glyph
>variant issue.

And it appears as underline in some fonts, e.g. Win2K's version of 
PMingLiu. I wish I had more hardcopy documentation for some of the 
standards. Lunde's book is usualy a good resource, but he glosses over the 
punctuation in favor of the ideographs, especially for Big-5/Eten/CNS.

A./



Reply via email to