On 2/23/2012 2:44 PM, António Martins-Tuválkin wrote:
On 2012/2/23 Matt Ma<matt.ma.um...@gmail.com>  wrote:

It is defined as
"33D7;SQUARE PH;So;0;L;<square>  0050 0048;;;;N;SQUARED PH;;;;"
in UnicodeData.txt, but it is shown as "pH" in code chart. Should it be
"0070 0048" or "PH"?
It should certainly be "pH", i.e., "<square>0070 0048</square>",
because that's the peculiar casing in widespread (universal, really)
use for this basic Chemistry concept (AFAIK it means "power of
Hidrogen"). See<  http://en.wikipedia.org/wiki/pH#History>.

While there's no surprise at "PH" Unicode names being all caps, I’m
surprised that the decomposition mapping is wrongly set to 0050 0048
instead of to 0070 0048.

The early fonts and code tables showed this in all caps.

Unfortunately, mappings are frozen - including mistakes.

One of the many reasons not to use NF"K"D or NF"K"C for transforming data - these transformations should be limited to dealing with identifiers, where practically all of the problematic characters are already disallowed.

If your intent is to sort or search a document using "fuzzy" equivalences, then you are not required to limit yourself to the NF"K" C/D transformations in any way, because you would not be claiming to be "normalizing" the text in the sense of a Unicode Normalization Form.

A./


Reply via email to