Encodings appears to be a special module that gets imported by CPython on 
startup.  It appears to get imported even if you startup and disable reading 
site.py on CPython.  Currently IronPython has no dependencies on the standard 
CPython library and we didn't want to add one just for this.  The mac_roman and 
mac_green encodings are all defined in this module - it's just that CPython 
will import it for you so it justworks.

We probably also need to add some additional aliases for the encodings we 
support but for which you're not finding (e.g. gbk). We have some hardcoded 
encodings already but most of them are just automatically translated from .NET 
to Python names and that doesn't always get all the correct mappings.  We also 
map all code pages to cp# so that explains why we support that but CPython 
doesn't.  I don't think we would special case to not do this mapping in some 
cases either.

The CPython test suite actually gives us a little trouble here because it 
doesn't import encodings.  I believe we have a slightly modified version that 
does the import and makes sure everything works.

Dumping this into a bug would be great..  I think the main thing to capture 
here is that we're missing some names & alises in our standard encodings.




-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of John Machin
Sent: Saturday, October 07, 2006 7:54 PM
To: Discussion of IronPython
Subject: [IronPython] IronPython codec names not compatible with CPython


CPython recognises both 'gbk' and 'cp936' i.e. unicode('some string',
'gbk') does what you'd expect.
IronPython 1.0.1 recognises only 'cp936'.

CPython recognises 'mac_roman', 'mac_greek', etc.
IronPython doesn't.

After a [rare] flash of inspiration, I tried 'cp10000', 'cp10006', etc and 
IronPython recognises these, which CPython doesn't.

The "differences" document says: """
IronPython's _codecs module implementation is incomplete.  There are several 
replace_error/lookup_error handlers that IronPython does not implement.
"""
It is not apparent whether this is intended to mean that missing error handlers 
is the *only* known deficiency.

IronPython Bug #3214 mentions "import encodings" as fixing a LookupError. Well, 
you learn something new every day:
1. CPython permits one to import encodings, but it's not documented AFAICT, and 
it's *not* necessary in order to use 'gbk', 'mac_roman', etc.
2. After import encodings, IronPython recognises 'mac_roman' and 'mac_greek', 
but still not 'gbk'.

How much of the above is bug and how much is feature? What is this mysterious 
encodings module anyway? Does this mean the CPython test suite doesn't cover 
the above cases? Are the equivalences (mac_roman,
cp10000) etc correct and official? Should I just dump all of the above into the 
IronPython Issue Tracker?

Cheers,
John
_______________________________________________
users mailing list
users@lists.ironpython.com
http://lists.ironpython.com/listinfo.cgi/users-ironpython.com
_______________________________________________
users mailing list
users@lists.ironpython.com
http://lists.ironpython.com/listinfo.cgi/users-ironpython.com

Reply via email to