http://www.unicode.org/L2/L2015/15107.htm
points indirectly to: http://www.unicode.org/L2/L2015/15145r-add-regional-ind.pdf which says: > The proposal has two parts > > 1. Un-deprecate TAG characters E0020-E007E. Hee hee. Hee hee. > 2. Define a character as the “base” for a following sequence of > TAG characters that specifies a region or subregion to be > represented using a sequence of TAG characters. There are two > possibilities for the base character: > > a. Preferred: Use the Unicode 7.0 character WAVING WHITE FLAG: > 1F3F3;WAVING WHITE FLAG;So;0;ON;;;;;N;;;;; > The advantage is no new characters need be encoded. "Add language to UTR #51 describing the mechanism given in 2A" means that U+1F3F3 will be the tag introducer, basically the "flag emoji" equivalent of U+E0001 LANGUAGE TAG. I think I understand why the TAG/CANCEL TAG start-end mechanism which was invented for Plane 14 language tags wasn't reused for flag emoji. Adding U+E0002 FLAG TAG would have implied that the sequence ends with CANCEL TAG. Flags don't have scope and there is no need to indicate the end of the sequence explicitly for scoping purposes, as there is with tagged text. I assume that existing text with U+1F3F3 followed by no tag characters should continue to display the waving white flag glyph, whereas text conforming to this new mechanism should suppress that glyph and show the Scottish, Welsh, Delawarean, or Nordlending flag instead. > Using the following notation - > B designates the chosen base character (U+1F3F3 or new U+1F1E5) > TL designates a TAG LATIN CAPITAL LETTER (A..Z) > TD designates a TAG DIGIT (ZERO..NINE) > TH designates TAG HYPHEN-MINUS > > - a well-formed sequence for for designating flags for ISO 3166-1, > 3166-2 or UN M49 codes would be > > B ((TL{2} (TH (TL|TD){3})?) | (TD{3})) Will the subdivision sequence always be exactly 3 characters long? CLDR ticket #8423 seems to say that ISO 3166-2 code elements that are only 1 or 2 characters long will be prepended with "xx" or "x" to make them all exactly 3. Obviously some research will need to be done to ensure this doesn't result in conflicts with existing code elements, and of course 3166-2 makes no promises that future assignments will deliberately avoid such a conflict. Will both mechanisms, old and new, be available for encoding national flags? For example, for a French flag: <1F1EB 1F1F7> or <1F3F3 E0046 E0052> > In CLDR 28, LDML will define a unicode_subdivision_subtag which also > provides validity criteria for the codes used for regional > subdivisions (see CLDR ticket #8423). When representing regional > subdivisions using ISO 3166-2 codes, only those codes that are valid > for the LDML unicode_subdivision_subtag should be used. I note that a preliminary file is already available at http://unicode.org/repos/cldr/trunk/common/supplemental/subdivisions.xml . -- Doug Ewell | http://ewellic.org | Thornton, CO 🇺🇸