Philippe Verdy vamped:

> > > For example I would not be shocked if a text using it was rendered with 
> > > a monospaced font, where the base line of the character cell shows
> > > multiple tiny dots, that create a contiguous dotted line when multiple
> > > U+2024 characters (one per display cell) are used to indent the text in 
> > > columns.
> > > 
> > > Of course with proportional fonts this character would display at least 
> > > (and preferably) a single dot. Any use of this character that assumes
> > > it is a symbol consisting in a single dot aligned on the baseline seems 
> > > to abuse the semantic of this character, which is not a punctuation,
> > > but really a styling character used instead of an "invisible" thin
> > > space.
> > 

And Jim Allan asked:

> > Where is this behavior indicated by Unicode specifications?
> > 
> > Such behavior appears to me to be a non-standard extension on Unicode, 
> > interpreting what Unicode classes as a General Puncutation character as 
> > instead a Formatting Character.

> > But I don't see how conforming aplications could assume this semantic 
> > for the character when reading in plain text Unicode or writing plain 
> > text Unicode.
> > 
> > What then is U+2025 TWO DOT LEADER?

And then Philippe Verdy continued to improvise:

> For me this one is a punctuation, commonly used to designate 
> a separator between bounds of intervals like [0..1] (it is 
> generally surrounded by a thin space on both sides with strict 
> typography). It should not be used to create arbitry lengths 
> of leaders.

What he is talking about here is generally represented by
the sequence <U+002E, U+002E>, in other words, just two
full stops, as in the example given "[0..1]". Typographical
rules then deal with any issues of spacing around or between
the dots.

> 
> The three dot leader is also a punctuation (normally not 
> prefixed by any space, but followed by a large space like 
> for the full dot). It should not be used to create arbitry 
> lengths of leaders.

This is a reference to U+2026 HORIZONTAL ELLIPSIS, and Philippe
is correct that that should not be used to create arbitrary
leaders.

> The one-dot leader should have no other purpose than to be 
> used in sequences of arbitrary length. 

This statement is only very accidentally true. Explanation
below.

> The whole sequence of single-dots leaders like this forms a 
> single token with the semantic of a word separator, where the 
> number of displayed dots is not really relevant for the reader 
> of text whatever is rendering style or fonts.

But this is absolutely false, as Jim Allan suggested.
U+2024 ONE DOT LEADER is a graphic character, whose glyph
consists of a small baseline dot, and whose General Category
is Po (Other Punctuation). It cannot be used conformantly as
if it were a formatting control standing in for a rich text
representation of a leader object (e.g. in a generated
Table of Contents in a Word or FrameMaker document).


> I just think that this 1-dot leader is used as a way to transcode
> within a single string what was initially a tabulation decorated 
> by some markup system, 

False.

Now, here is the true story of U+2024.

It is a compatibility character, introduced for compatibility
with XCCS (Xerox Character Code Standard) 1980, where it
was mapped to the coded character 356B/242B (0xEEA2),
described as "Leader, one-dot on an en body".

Its use in XCCS would have been to create leaders manually,
by lining up a sequence of "one-dot on an en body" to create
a sufficiently long leader. Its rationale in Unicode would be
to either map to data created in XCCS or to manually lay
out text using a comparable mechanism, but for which one wished to
distinguish the "dots" thus used from U+002E FULL STOP.

U+2025 TWO DOT LEADER is also an XCCS compatibility character.
It corresponds to XCCS 356B/243B (0xEEA3) "Leader, two-dot
on an en body" *and* to 041B/105B (0x2145) "Leader, two-dot
on an em body". The difference in width was considered
a formatting distinction and was unified away in creating
the U+2025 encoded character, as preserving that distinction
in plain text was considered unnecessary by the Xerox
representative to the committee at the time.

U+2026 HORIZONTAL ELLIPSIS maps to the ellipsis seen in a
number of legacy character encodings, including the Macintosh
character sets, but also maps to an XCCS character: 041B/104B
(0x2144) "Leader, three-dot on an em body".

All *three* of these characters should be considered
compatibility characters. Indeed, they formally *are*
"compatibility decomposable characters" (Chapter 3, Definition
D21), since they each have compatibility decompositions
to one or more U+002E FULL STOP characters.

That last fact should be taken as a hint that for most
purposes, manual leaders should just be sequences of FULL STOP
characters (as you will see, for instance in the plain text
representations of Internet Drafts or RFCs, for example).
But in any rich text format, leaders are styled formatting objects
(somewhat similar to tabulations, as Philippe suggested), but
that does *not* make U+2024 a format character (LEADER
PLACEHOLDER, or whatever). It is exactly what it claims to
be: a compatibility character, punctuation, with a single
baseline dot as its glyph.

--Ken


Reply via email to