Isn't there also a new datafile with beta status, that lists the usage of characters shared by multiple scripts ? If so, it should also concern the Arabic-Syriac number sign (actually an abbreviated ligature of the Arabic word for year, with a subtended stroke that can span below several digits appearing on its right, but to be encoded in texts before sequences of either Arabo-Indic, or Arado-Persian, or Syriac digits) currently being proposed as a format control in an Arabic block, and to the "serpentine" format control also proposed in the same Arabic block and that is also another "sort of" diacritic that also spans several digits of Syriac or Arabic numbers encoded after it, and so will be encoded the same way as a format control.
Anyway, I don't know how fonts can support those format controls (except for a limited number of digits, by splitting the glyph in several parts: a fixed leading glyph, variable number of middle glyph, and trailing glyph, and some complex and contextual substitution rules to reorganize the list of glyphs). With OpenType only, and based on the behavior of OpenType renderers that offer very limited control on glyph order and no easy representation to handle this transform,I fear that it will not work ; but it could be possible with a SIL Graphite feature extension to TrueType or OpenType/TT (and a Graphite enabled renderer) and/or an Apple AAT feature extension to OpenType/CFF (on MacOSX only). This requires some research, because it could finally influence the effective encoding and properties needed to get a working support of these new proposed characters (which are not really control formats but really true characters with uncommon layout that does not fit very well with the current Unicode character encoding model, which has a same caveat with the representation of double-width diacritics, and which offers absolutely no layout support at all for Egyptian hieroglyphs). Shouldn't the Unicode character encoding model be updated to better take into account those complex layouts ? (Let's remember how Hangul was encoded to avoid the issue, assigning a LOT of characters for the same logical characters and duplicating the consonnants, or how the Han script is constantly being increased almost infinitely, despite it really has an evident structure for the complex layout of its grapheme clusters based on simpler base sinograms...) Also, some research for integrating in the ISO standard for OpenType some of the capabilities currently offered only in Graphite and/or AAT would be welcome: creating 3 font flavors and maintaining compatibility within documents rendered by several platforms should find some end, even if this means deprecating some older features of OpenType, AAT and Graphite. But due to the existing differences, this forces documents to be tweaked in their encoding, and this is a problem for the stability of the Unicode encoding itself, across platforms and even across versions of their text renderers and layout engines on the same platform... If we continue like this, we'll get in Unicode too many characters maintained for backward compatibility, but not working effectively and being replaced later with new confusables... I do think that newly proposed characters SHOULD take into account the cross-platform compatibility of the proposed technical encoding, within a CLEAR and UNIVERSAL character model. The character encoding process at ISO and UTC is not enough, we do need now a more serious cooperation with implementors of font technologies, especially for scripts needed complex layouts (encoding with "control formats" will not solve any problem). 2011/8/13 Peter Constable <peter...@microsoft.com>: > From: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] On > Behalf Of mmarx > >> While I am at it: >> should Unicode acknowledge that U+0730, U+0733, U+0736, U+073A >> and U+073D are used with the ARAB script or does this lie outside its >> competence and jurisdiction? > > Unicode has character properties declaring script associations for > characters. Currently, all of those characters have a Script property of > "Syriac". If characters are known to be used in multiple scripts, they > generally should have a Script property of "Common"; there is also a > provisional data file called ScriptExtensions that can be used to provide > more information about Script = Common characters (e.g., the character is > used for Syriac and Arabic scripts, but not generally _any_ script). > > Properties for the characters you mention could be changed. That will not > happen unless a document is submitted to UTC that outlines the specific > changes proposes and provides rationale for making those changes. The > rationale should provide evidence in terms of existing usage scenarios and > should also consider any potential issues that may arise for existing > implementations (if applicable). The information should also be adequate for > making changes in the Arabic and Syriac block descriptions in the text of the > Standard so that implementers have a chance of learning about the scenarios > and requirements; the proposal document could include draft text for > insertion in the block descriptions. > > Submitting a doc to UTC is a basic requirement. The issue also needs to make > it onto the agenda of a UTC meeting, and it helps to have a champion to make > sure that happens and that can be available to discuss the issue with the > UTC. These things are much easier if you are a member of the consortium (cost > is as little as $35/yr for students). > > > Peter > > > > > > >