Philippe Verdy continued: > What surprizes me the most in the Unicode spec is that it > both says that its purpose is to create arbitrary length > of leaders
As in plain text, as can be seen in Table of Content listings in many RFCs, for example. (Which, however, use ASCII 0x2E for the same purpose.) > (you say that the spacing statement in the Xerox name was > not considered important by Xerox, so how many leaders would > be needed to fit a en space with the Unicode designation?). If you mean how many leader *dots* would it take to fit an en space, that would depend on the font in Unicode, as for so much else. My guess would be that the correct answer is approximately the same as the number of angels that can stand on the dot. Very few characters in Unicode have any specified widths. That is by design. > Why then do you insist that it represents one dot ? Because that was the intent of the Unicode Technical Committee when it encoded the character, and is the clear intent of the standard as currently specified. > You also seem to insist o the "compatibility" decomposition > which is normally removing an important semantic (else it > would be canonical). I'm simply restating the specification in the standard. Read it yourself. > All this seems like creating contradictions. > > Also it would be the only punctuation sign whose number of > occurences is not relevant False. See the discussion of Tibetan justifying tseks in: http://www.unicode.org/versions/Unicode4.0.0/ch09.pdf > (in dotted lines used as leaders), Or, for that matter, in plain text visual line separations also created by stringing together ASCII punctuation: ********************************************************** like that. Such legacy use of punctuation characters is no different than legacy use of a sequence of periods to create leader lines in plain text. > as the final presentation of the text will need to compensate > for font metrics differences in order to produce the correct > effect (also because the size of the dots where removed from > the Unicode designation.) So? That is irrelevant to the question at hand. People who do stuff like this, as in plain text RFCs, display text in monospace fonts and don't expect dynamic reflowing of text. People who do leader lines correctly for fine typography do them with internal data abstractions, and those data abstractions aren't based on interpreting U+2024 as a format control character. > I do no agree wih your argument that says that it is like a > full dot to be used in limited applications You can disagree with my argument all you like. But if you insist on coming on the unicode list and spouting nonsense about particular characters in the standard, suggesting that people implement them in ways that would be nonconformant with the standard, then expect people to respond to the nonsense. > (if Unicode wanted to remove the spacing, it was to generalize > is use as an abstract character, not to reenforce its mapping > to an approximate full dot.) That claim is errant nonsense. > I never heard about the Xerox CCS before, but there's a large > legacy usage of the ellipsis as a single unbreakable character Correct. And U+2026 is encoded precisely for that legacy practice. > (and the two dots for the notation of interval bounds are also > unbreakable). True, but this kind of behavior falls automatically out of most implementations' treatment of U+002E characters in sequence. Check UAX #14, which discusses the line break behavior of both the leader dot characters and U+002E FULL STOP. U+002E is lb class IS, and since class IS prohibits a break before, a sequence of two periods in a row, as in [0..1] does not have a break opportunity in the middle of the sequence. > The single dot leader looks like a way to fill the gap, > only because two-dot three-dots ellipsis did not allow, > in most fonts and applications, to create a regular leader, > using smaller dots than the one used for the regular full stop > punctuation. You are mixing up glyphs and characters here. In "most fonts and applications" leader dots are *glyphs* used to express a measured leader line, not characters at all. > The fact that it was unified with XCCS (with some > compromizes accepted by Xerox) clearly demonstrates that > the Xerox design was not the main focus: In the case of encoding of the ONE DOT LEADER, you don't know what you are talking about. > - Who knows XCCS and use it ? Very few people. Today, yes. But it was a key source of character repertoire for Unicode 1.0, and choices made in the XCCS often guided thinking about character/glyph distinctions for Unicode. > - Who uses leaders ? Every publisher and author of long documents > that do not want to see irregularily spaced leaders, or a dotted > grid instead of a true dotted horizontal line. This is irrelevant to the claims you have been making about U+2024. > > Leaders are visual helpers for the eye of readers, they have > absolutely no punctuation or symbolic semantic (unlike the > two-dots symbol or the ellipsis). The fact that it was categorized > as a punctuation is probably an initial error It was not. The error is your assumption that the TWO DOT LEADER was encoded to represent the convention of using <U+002E, U+002E> to indicate a range. > that can' be corrected and that comes from the classification > of its approximative fallback "compatibility decomposition". > > So you seem to mix the very distinct concept of compatibility > characters and compatibility decompositions: I see... [*looks around the office to see who else it was who wrote that text in Chapter 2*] ...but I do appreciate the coals delivered to Newcastle. ;-) > - compatibility characters are for the initial mapping from an > important legacy encoding with full roundtrip, and the > exact semantic is preserved in this mapping to Unicode. The usage > of these Unicode codepoints is discouraged out of this legacy usage. > > - characters that have compatiblity decompositions are intended > as guides for acceptable fallback characters that will not create > too confusive interpretation by readers, but the exact semantic > is not preserved with their compatibility decomposition. Their > usage is not discouraged but instead favored by Unicode which > adds important semantics in the "composed" character. I won't desconstruct this sentence by sentence. But use of compatibility characters is not discouraged. *Some* of them are deprecated; *some* of them are inappropriate for particular uses; *some* of them are, in fact, required for other contexts. It depends on what you are doing in your implementations. Compatibility decompositions were *not* defined as guides for acceptable fallback. They can be used as part of a fallback conversion implementation, but fallback is a much more general problem, and applies to characters that have no decompositions and to characters with canonical decompositions, as well. Finally, some compatibility decomposable characters are not only discouraged, they may even be "strongly discouraged", for one reason or another. See, for example, U+0F77 and U+0F79. I'd advise more care in making unjustified generalizations and then proclaiming them to the unicode list as if they were expert opinions. --Ken