I presume that the user has to know that the character cannot be displayed.
However using a special glyph has a number of problems:

1) You do not know if the character is missing and the glyph is substituted
or if the text really encodes the glyph.

2) If you see multiple missing characters, you do not know if it is the same
character or different characters.

3) If you call for help there is no way of reporting the values that are
mis-encoded.

Fujitsu had an elegant solution for JEF encoding.  This was an extended
shift JIS encoding.  Because the characters were double byte they used the
four hex encoding values of the missing character to create a special glyph.
This fit very well into the four quadrants of the kanji character space.

                                XX
                                XX

This allowed a user to screen print defective text and have a support
technician decipher the text had better determine the cause.

This would work well for Unicode BMP but I suspect that most of the nasty
problems will be with characters in other planes.  Therefore we can use 6
hex digits as follows:

                                XX
                                XX
                                XX

Even though 0123456789ABCDEF can be rendered much smaller than a lot of
glyphs especially if you know that the characters are within this subset,
you may have to limit the minimum point size.  An alternative might be to
use a alternate hex character glyph set that can be deciphered at a smaller
point size.

Another alternative is to encode the plane number differently in a more
compact form since there are only 17 planes.  You might then also reserve an
18th iteration to indicate an invalid plane character with the leading bytes
only.

For example:

A single bar with no dots below the bar is BMP (plane 0)
The hex digits are below this:

                        __

                        XX
                        XX

A single bar with 1 dot in the right most of four positions below the bar is
plane 1.

                        __
                         .
                        XX
                        XX

A single bar with 1 dot in the right middle of four positions is plane 2.

A single bar with 2 dots in the right middle and right most of four
positions is plane 3.

.....  (Simple binary encoding)

A single bar with 4 dots in all four positions is plane 15.

A double bar is plane 16.

Two squares is an invalid plane and the hex values are the two high order
bytes of a 32 bit value.  In other words the plane number.

Using this encoding will allow the four hex digits below to be slightly
larger and more readable.  It will also make this a more distinguishable
composite glyph.  It will require a bit more sophistication but within the
realm of what a user can communicate over the phone to a knowledgeable
support technician. This character shape might be more compatible with most
text which often has to accommodate the possibility of some Latin text.  It
would be higher that wide but not as high as the 6 hex digit grouping.


Carl W. Brown













Reply via email to