I have some food for thought, which I'm no

Perl 6 is defined in terms of Unicode in many respects, but in other respects it also seems to be agnostic to the character repertoire or encoding or abstraction level or whatever, where the latter is explicitly encoded as meta-data in Str values for example, allowing multiple representations of text in some manners of speaking.

Now, I believe that if Perl 6 truly wants to be all-encompassing, such that any other language can be expressed as a grammar of Perl 6, or that otherwise Perl 6 should be more flexible to handle anyone's text needs, it can't be restricted by Unicode.

You see, as is known or documented in many places, numerous aspects of Unicode are controversial, such that while it does a lot, there are a lot of needs it doesn't address, and there are various complexities.

For example, there was the controversy of Han unification, where glyphs with very similar appearance from multiple cultures were treated as being the same characters in Unicode, while many Asian people want to treat them as distinct.

This behavior is also in contrast to how for some scripts Unicode provides various redundant codepoints for the same glyphs.

And there are other examples of characters from various scripts which are missing or mis-organized in Unicode, according to some users of those scripts.

One consequence of this, is that other character repertoires have been created or have not been dropped, and are in use alternatively to Unicode, such as by some cultures opposed to the Han unification, so that they can properly express what they mean to say.

I propose that Perl 6 extend its existing support for multiple character abstractions, with its meta-tagged code strings, so that character repertoires that extend outside of Unicode are also supported.

Examples are Mojikyo, TRON, GB18030, and several others.

See http://www.jbrowse.com/text/unij.html for some information on the matter.

See also http://www.ruby-forum.com/topic/165927 where related matters were discussed for Ruby, but I found that after I wrote this message.

*The idea here is that by being more flexible in what is supported, it is easier for Perl 6 users to express what they actually want to say in their code or in data processed by it.*

A corollary to there being allowed alternatives to Unicode, is that if we wanted to the Perl 6 spec could possibly be split more with some aspects being considered more core and some less so, and the support for vast or complicated character sets like Unicode in general could be made more optional. The idea here being that it is possible to make more well-defined what a more minimal Perl 6 may consist of. I suggest plain ASCII be the minimum and everything more is optional. And pluggable.

Making the big complicated charsets optional and pluggable is good, I think.

-- Darren Duncan

Reply via email to