"Mark Davis ☕" <[email protected]> wrote: > This topic is not particularly relevant to Unicode. Could people please > carry on this discussion on a different list? There are internet groups > devoted to hexadecimal and other topics (eg the adoption of Shavian by the > United Nations) where communities of like-minded people can be found. > On Tue, Jun 8, 2010 at 09:22, Luke-Jr <[email protected]> wrote: > > > On Tuesday 08 June 2010 10:53:15 am John Dlugosz wrote: > > > Yes, when discussing values in hex, this is an English problem. What do > > I > > > call the useful higher powers and groups? What is the equivalent of > > > "thousands" or "millions" to refer to powers of 65536 or 4294967296? > > > > Seriously, these questions are all answered in the book... > > > > (written using "classical" hexadecimal digits) > > 0=Noll 1=An 2=De 3=Te 4=Go 5=Su > > 6=By > > 7=Ra 8=Me 9=Ni A=Ko b=Hu C=Vy > > d=La > > E=Po F=Fy 10=Ton 100=San 1000=Mill 1,0000=Bong > > 1,0000,0000=Tam 1,0000,0000,0000=Song 1,0000,0000,0000,0000=Tran > > 2,8d5b,7E0F=Detam, memill - lasan - suton - hubong, ramill-posanfy
This last message is certainly more on topic there, it discusses existing characters and their usage in some experimental (mostly written) language (don't know exactly which ones, may be just the language used by the initial creator of this system), and the related localization issues (which could also interest CLDR localizers), even if they are used by a very small minority. It also helps inderstanding what could be other issues related to other older numeric systems. And the 8 characters discussed here (for digits 8..15) are certainly good subjects for a possible proposal for encoding, even if they will certianly not fit in the BMP (they could easily fit in the SMP, and their character properties will certainly not be gc=Nd but gc=No). But I have no opinion if the 8 first digits (for numeric values 0..7) should also be reencoded. Also there's no problem in using characters with different gc in the same numeric system (after all this is already the case in the common [0-9a-fA-F]* notation where there are gc=Nd, gc=Ll, and gc=Lu, or with other indic or african scripts where they may also exist additonal digits with gc=No for fractions of unity). There's no extra character needed for the three positional powers of 16 and the 4 positional powers of 16^4 used in the number names: this is not different from the case of powers of 1000 in the decimal positional system used in European languages, or the powers of 10000 used in some Asian languages, but this is not a problem here for naming the characters). Note that the glyph used for one of those digits ressembles to digit 9 (with which it is fully confusable), but it has a distinct numeric value (for this reason, it should be encoded separately, because of its distinct abstract identity). However I'm not sure about which script they should assigned to. For me this should be the same script property as existing digits 0..9 (of ASCII), with which they are used together in sequences or arbitrary order. May be they could be encoded as arbitrary hex digits, and the code positions U+1xxx0..U+1xxx7 should left free, and assigned only later if there are similar hexadecimal or octal systems and they can be unified for having the same abstract properties, and that should also be given gc=No and not gc=Nd, due to their specific usage). But here this would be a "political decision" (the glyph, even if it is not mandatory in ots exact form, is still part of the character identity, when there's limited possibility for variation and impossiblity to swap them, so other possible cadidate systems could easily choose to reuse the glyphs existing ASCII digits 8..9 with their current value, so that this would conflict with the assignment of these 8 characters for the "Ton-al" system) This discussion correctly describes what could be candidate names for the 8 candidate characters to encode as U+1xxx8..U+1xxxF, if this "Ton-al" system had to be supported (there may be some interest from some ISO member to do that for use in their public libraries, in their digitizing efforts). In fact this set is rather complete and well documented so that there's no real difficulties. The fact that this system did not have success (in its time) does not mean it is out of interest (after all, other extinct scripts were encoded, but because there's an active community using them at least for linguistic, archeologic or religious researches.) But here, it is not really need to help understand an old civilization, when the system has been created and explained in another modern language and culture that does not need it. But there may be interest for reproducing the books, publications and products displaying those 8 characters. And recent inventions were also encoded as well (notably currency symbols, and soon there will be emojis), so age of this character is not so much a factor for the decision to encode them or not. Certainly there will not be a large support for fonts containing them or being updated only to include them given the very small usage, but small fonts could be easily and rapidly created containing only the 8 common digits and the 8 supplementary digits, plus possibly some punctuations. Before that, it is easy to encode them with PUAs, and consign them in the CSUR prior to future adoption and encoding in the SMP (a font displaying them as PUAs should remain named/tagged as "Beta" or "PUA", this could be "Tonal Digits PUA") and replaced later by another similar font (with glyphs renumbered using SMP assignments, and a name matching the assigned block name), or in a font containing other standard subsets of numeric/maths characters, digits and symbols. Philippe.

