Re: PUA (BMP) planned characters HTML tables

James Kass via Unicode Wed, 14 Aug 2019 02:10:29 -0700


On 2019-08-12 8:30 AM, Andrew West wrote:

This issue was discussed at WG2 in 2013
(https://www.unicode.org/L2/L2013/13128-latvian-marshal-adhoc.pdf),
when there was a recommendation to encode precomposed letters L and N
with cedilla*with no decomposition*, but that solution does not seem
to have been taken up by the UTC.

Group One dots their lowercase "i" letters with little flowers and GroupTwo dots theirs with little hearts. Group Two considers flowersunacceptable and Group One rejects hearts. Because of legacy charactersets there's a precomposed character encoded called "LATIN LOWER CASE IWITH HEART", but it was misnamed and is normally drawn with a flowerinstead. Group Two tries to encode "LATIN LOWER CASE I" plus "COMBININGHEART" to get the thing to display properly. But because there's adecomposition involved, the font engine substitutes the glyph mapped to"LATIN LOWER CASE I WITH HEART" in the display for the string "LATINLOWER CASE I" plus "COMBINING HEART". This thwarts Group Two becausethey still get the flower.

The solution is to deprecate "LATIN LOWER CASE I WITH HEART". It's onlyin there because of legacy. It's presence guarantees round-trippingwith legacy data but it isn't needed for modern data or display. UrgeGroups One and Two to encode their data with the desired combiner andeducate font engine developers about the deprecation. As the renderingengines get updated, the system substitution of the wrongly namedprecomposed glyph will go away.

This presumes that the premise of user communities feeling stronglyabout the unacceptable aspect of the variants is valid. Since it hasbeen reported and nothing seems to be happening, perhaps the casualusers aren't terribly concerned. It's also possible that the varioususer communities have already set up their systems to handle thingsacceptably by installing appropriate fonts.

Re: PUA (BMP) planned characters HTML tables

Reply via email to