I was really hoping this was a joke... it didn't hit me it was April 1... https://en.wikipedia.org/wiki/Plane_(Unicode)
PlaneAllocated code points[note 1] <https://en.wikipedia.org/wiki/Plane_(Unicode)#cite_note-5>Assigned characters[note 2] <https://en.wikipedia.org/wiki/Plane_(Unicode)#cite_note-6> Totals 280,016 136,755 almost 50% used now. Though that table omits 655,350 code points as 'unassigned' so it's really only about 16% (1/6) used using only 4-byte utf8 or 2 byte utf-16... and of those, that's only 20(plus or minus a faction of 1) bits? so a proposal of something a power of 6 larger than that when even just 1 more bit gives another million characters.... https://en.wikipedia.org/wiki/List_of_dictionaries_by_number_of_words I guess if it was encoded every word as a single code point... that wouldn't be enough seems about 7,716,121 words... so.. 24 bits. plus 1 to double it for good measure? *shrug* On Mon, Apr 2, 2018 at 11:15 AM, William_J_G Overington via Unicode < unicode@unicode.org> wrote: > Doug Ewell wrote: > > > Martin J. Dürst wrote: > > >> Please enjoy. Sorry for being late with forwarding, at least in some > >> parts of the world. > > > Unfortunately, we know some folks will look past the humor and use this > as a springboard for the recurring theme "Yes, what *will* we do when > Unicode runs out of code points?" > > An interesting thing about the document is that it suggests a Unicode code > point for an individual item of a particular type, what the document terms > an imoji. > > This being beyond what Unicode encodes at present. > > I wondered if this could link in some ways to the Internet of Things. > > I had never heard of IPv6. Indeed I checked on the Internet to find > whether that was real. So I have started reading and learning. > > It would, in fact, be quite straightforward to encode what the document > terms 128-bit Unicode characters. > > For example, U+FFF8 could be used as a base character and then followed by > a sequence of 32 tag characters, each of those 32 tag characters being from > the range > > U+E0030 TAG DIGIT ZERO .. U+E0039 TAG DIGIT NINE, U+E0041 TAG LATIN > CAPITAL LETTER A .. U+E0046 TAG LATIN CAPITAL LETTER F > > That is, a newly-defined character from the Specials and then 32 tag > characters encoding a hexadecimal code point. > > Now, if that were called 128-bit Unicode then there could be problems of > policy, but if it were given another name so that it sits upon a Unicode > structure so as to provide an application platform that can be manipulated > using Unicode tools, including existing Unicode interchange formats, and > display formats for character glyphs, then maybe something useful can be > produced. > > Thus using 128-bit binary numbers in a local computer system and using > existing Unicode characters for interchange of information between computer > systems, converting from the one format to the other depending upon the > needs for local processing and for interchange of information. > > Of particular significance is the concept of encoding individual items > each with its own code point. > > Could this be used to relate glyphs to the Internet of Things? > > Could things like International Standard Book Numbers be included, with a > code point for each book edition? > > What about individual copies of a rare book? > > What about museum items? > > What about paintings and sculptures? > > Could this tie up with serial numbers used in GS1-128 Barcodes? > > Please note that the 128 in GS1-128 refers to the 128 characters of ASCII, > not to 128-bits. > > I am wondering whether U+FFF8 plus 32 tag characters could be handled > directly by a GSUB glyph substitution within an OpenType font. > > However, with such a large code space, there would need to be a way to > access glyph information over the internet, maybe use of a one-glyph web > font for each glyph would be possible in some way. > > William Overington > > Monday 2 April 2018 > > > >