RE: Question Regarding UCD Draft Files and GraphemeBreakTest Discrepancy

Peter Constable via Unicode Fri, 21 Mar 2025 22:44:18 -0700

At UTC meeting #182, UTC decided to remove the Extended_Pictographic property 
from a number of code points that are assigned to non-emoji characters. See


https://www.unicode.org/cgi-bin/GetL2Ref.pl?182-C20

This goes back to feedback submitted on the Unicode 15.0 beta — see feedback 
from Charlotte Buff (time stamp Fri Jun 24 10:24:49 CDT 2022) in

https://www.unicode.org/review/pri453/

which led to an action item to investigate removing the property from non-emoji 
characters, the outcome of which was a recommendation to UTC 182 to do just 
that — see section 5.1 in

https://www.unicode.org/L2/L2025/25006-utc182-properties-recs.pdf

The deeper background behind this is that Extended_Pictographic was created as 
a code point property that could be assigned to code points that were likely to 
be assigned in the future to emoji characters so that the line breaking 
implementation in a product sold (say) today would be forward compatible with 
emoji assigned in the future. The concern was that some devices might not get 
frequent software updates but user might start using new emoji created some 
time after the device was released. As Charlotte Buff observed in her feedback,

"the Extended_Pictographic property has no use outside of emoji ZWJ sequences"


So, that's the background. The draft emoji-data.txt file for Unicode 17 has 
been to remove several code points from Extended_Pictographic in accordance 
with UTC decision 182-C20. It's possible that some test data that should have 
had a corresponding update was overlooked. If you think that's the case, please 
submit feedback for PRI #514

https://www.unicode.org/review/pri514/

which is the public review issue for the Unicode 17.0 alpha review. (See the 
contact form link in that page.)



Peter


-----Original Message-----
From: Unicode <[email protected]> On Behalf Of Naoto Sato via 
Unicode
Sent: Friday, March 21, 2025 2:25 PM
To: [email protected]
Subject: Question Regarding UCD Draft Files and GraphemeBreakTest Discrepancy

Hello,

I have a question regarding the draft version of the UCD files 
(https://www.unicode.org/Public/draft/ucd/). I’m not sure if this is the 
appropriate place for such inquiries, so please forgive me if it is not.

While testing the draft "emoji-data.txt"
(https://www.unicode.org/Public/draft/ucd/emoji/emoji-data.txt), I encountered 
a failing test case in GraphemeBreakTest:

÷ 2701 × 200D × 2701 ÷  #  ÷ [0.2] UPPER BLADE SCISSORS (ExtPict) × [9.0] ZERO 
WIDTH JOINER (ZWJ) × [11.0] UPPER BLADE SCISSORS (ExtPict) ÷ [0.3]

This test case assumes that U+2701 is classified as Extended_Pictographic. 
However, the latest emoji-data.txt does not include it, whereas version 16.0 
did. Additionally, the web version of the test
(https://www.unicode.org/Public/draft/ucd/auxiliary/GraphemeBreakTest.html#s23)
also indicates that U+2701 is an Extended_Pictographic, leading to an 
inconsistency.

This discrepancy is causing our test to fail. Could you clarify whether this is 
an issue or an expected change?

Thanks,
Naoto

RE: Question Regarding UCD Draft Files and GraphemeBreakTest Discrepancy

Reply via email to