On 3/21/2013 4:22 PM, Philippe Verdy wrote:
2013/3/21 Richard Wordingham <richard.wording...@ntlworld.com>:
Further, the code chart glyphs for the ANO TELEIA and the MIDDLE DOT
differ, see attachment.  If they are canonically equivalent, and one
is a mandatory decomposition of the other, why do they have differing
glyphs?
Because the codepoints are usually associated with different fonts?
For a more striking example, compare the code chart glyphs for U+2F831,
U+2F832 and U+2F833, which are all canonically equivalent to U+537F.
This is another good example where a semantic variation selector


Philippe, let's not go there.

"Semantic" selectors are pure pseudo-coding, because if the semantic differentiation is needed it is needed in plain text - and then it should be expressible in plain character codes.

If you need to annotate text with the results of semantic analysis as performed by a human reader, then you either need XML, or some other format that can express that particular intent.

Internal to your application you can design a light weight markup format using "noncharacters", if you wish, but for portability of this kind of information you would be best off going with something widely supported.

The number of conventions that can be applicable to certain punctuation characters is truly staggering, and it seems unlikely that Unicode is the right place to
a) discover all of them or
b) standardize an expression for them.

The problem is, even if you could "encode" some selectors for certain common cases, the scheme would not be extensible to capture other information that pre-processing (or user input) might have provided and which might be useful to carry around in certain implementations - I'm thinking here that the full spectrum of natural language analysis for word-types might be as interesting as certain individual characters.

A./

A./

Reply via email to