going beyond Unicode

Darren Duncan Sat, 13 Feb 2010 02:15:04 -0800

I have some food for thought, which I'm no

Perl 6 is defined in terms of Unicode in many respects, but in other respects italso seems to be agnostic to the character repertoire or encoding or abstractionlevel or whatever, where the latter is explicitly encoded as meta-data in Strvalues for example, allowing multiple representations of text in some manners ofspeaking.

Now, I believe that if Perl 6 truly wants to be all-encompassing, such that anyother language can be expressed as a grammar of Perl 6, or that otherwise Perl 6should be more flexible to handle anyone's text needs, it can't be restricted byUnicode.

You see, as is known or documented in many places, numerous aspects of Unicodeare controversial, such that while it does a lot, there are a lot of needs itdoesn't address, and there are various complexities.

For example, there was the controversy of Han unification, where glyphs withvery similar appearance from multiple cultures were treated as being the samecharacters in Unicode, while many Asian people want to treat them as distinct.

This behavior is also in contrast to how for some scripts Unicode providesvarious redundant codepoints for the same glyphs.

And there are other examples of characters from various scripts which aremissing or mis-organized in Unicode, according to some users of those scripts.

One consequence of this, is that other character repertoires have been createdor have not been dropped, and are in use alternatively to Unicode, such as bysome cultures opposed to the Han unification, so that they can properly expresswhat they mean to say.

I propose that Perl 6 extend its existing support for multiple characterabstractions, with its meta-tagged code strings, so that character repertoiresthat extend outside of Unicode are also supported.


Examples are Mojikyo, TRON, GB18030, and several others.

See http://www.jbrowse.com/text/unij.html for some information on the matter.

See also http://www.ruby-forum.com/topic/165927 where related matters werediscussed for Ruby, but I found that after I wrote this message.

*The idea here is that by being more flexible in what is supported, it is easierfor Perl 6 users to express what they actually want to say in their code or indata processed by it.*

A corollary to there being allowed alternatives to Unicode, is that if we wantedto the Perl 6 spec could possibly be split more with some aspects beingconsidered more core and some less so, and the support for vast or complicatedcharacter sets like Unicode in general could be made more optional. The ideahere being that it is possible to make more well-defined what a more minimalPerl 6 may consist of. I suggest plain ASCII be the minimum and everything moreis optional. And pluggable.


Making the big complicated charsets optional and pluggable is good, I think.

-- Darren Duncan

going beyond Unicode

Reply via email to