On 3/23/2013 4:55 AM, Michael Everson wrote:
On 23 Mar 2013, at 01:01, Asmus Freytag <asm...@ix.netcom.com> wrote:

Let's get back to the interesting question:

Is it possible to correctly process text that uses 00B7 for ANO TELEIA, or is 
this fundamentally impossible? If so, under what scenario?
It is possible to "process" text without Unicode at all, using sets and sets of 
8-bit font-hack fonts. We all did it for years.



A bit of a non-sequitur in that whatever may have been done with 8-bit standards doesn't necessarily advance the discussion about how to do things in Unicode. Also, arguably not fully applicable, because the types of processing that could be done with those legacy sets exclude some important real-world scenarios that only Unicode enables...

In Unicode, 00B7 and 0387 are canonically mapped, so making distinctions based on code point is not guaranteed to be portable. That's why I singled out 00B7 (not 0xB7, but U+00B7).

The question was, given that, "Is possible to correctly process text that uses one and the same character code for ano teleia, middle dot, raised decimal point (and fourteen other uses), or is this fundamentally impossible? If so, under what scenario?"

I think handling raised decimal dot is not any more difficult than recognizing when period is a decimal point (there are some edge cases there that are challenging, but implementations have settled on using period, so that's a done deal).

I don't know about the fourteen other uses, but there's been a lot of griping about ano teleia. (That's why I singled out that one, even though II know most of the griping took place in a parallel discussion on another list).

I think it would be useful to actually write down an overview of the recommended implementation approach for handling ALL the different uses for middle dot and to make sure that what is recommended is not only theoretically possible, but acceptable and accepted(!) as best practice by implementers, users, and font designers alike.

If such a document were to successfully cover all (widely-)known cases, it would make fine material for adding to the character description. If there are holes (things that can't be done - see the question) then it would form a basis on which UTC could make some decisions on how to improve the standard.

A./

Reply via email to