On 7/17/2011 12:19 PM, Philippe Verdy wrote:
2011/7/17 Asmus Freytag<asm...@ix.netcom.com>:
On 7/17/2011 2:35 AM, Michael Everson wrote:
... invisible and stateful control characters are more expensive than
ordinary graphic symbols.
In this case, the expense is so much higher as to rule out such an idea from
the start.

A./

PS: this doesn't mean that adding graphic symbols is the foregone thing to
do, only that, if evidence points to the need to address this issue in
character encoding, then, using graphic symbols is the better way to go
about it.
Another alternative: instead of encoding separate symbols for each
control, we could as well encode symbols for each character visible in
those symbols.

E.g. ro represent the glyph for the RLO control, we could encode three
characters, one for each of R, L, and O, as DOTTED SYMBOL FOR LATIN
CAPITAL LETTTER R, DOTTED SYMBOL FOR LATIN CAPITAL LETTER L, DOTTED
SYMBOL FOR LATIN CAPITAL LETTER O. These three symbols would have a
representative glyph as the base letter from which they are derived,
within a dotted rectangle.

Then each of them would contextually adopt one of four glyph forms :
the full rectangle, or the rectangle with the left or right side
removed, or both sides removed. The selection would be performed
selectively.

I'm baffled: what problem is this elaborate scheme trying to solve?

The problem was never in *how* to encode such symbols, but in *whether* they should be considered *characters* (and therefore need to be supported on the character level of the architecture). That point, whether there's a reasonable use case for them as characters, has not been settled, so the case for thinking about encoding solutions has not been established.

When people write about a line feed character, they use "LF" or "linefeed" or 000A (or U+000A or 0x0A etc.). They commonly don't use the "LF" symbol character, nor any other unencoded symbol.

I claim, the same is true for ZWJ, RLO, PDF and all the other good characters.

Just because Unicode uses dashed box placeholders in the code charts hasn't made them the generally accepted, universally understood *symbols* for these characters.

This is different from the "pictures for control codes" because at the time, these were widely supported in devices, and users of these devices (terminals) were familiar with the convention (staggered small letters) and many would recognize common control characters.

So, let's keep a lid on devising ever more arcane and fragile encoding and pseudo-encoding options until there's consensus that this issue must be addressed on the character level.

A./

Reply via email to