On Thu, Apr 1, 2010 at 11:52 AM, Marius Gedminas <mar...@gedmin.as> wrote: > I don't think I'll be able to work on it, but I think it's worth > consideration: Unicode issues with Zope 2.12. I've seen these on at > least three different Zope 2 sites built with a combination of TTW page > templates, Python scripts and (sometimes) DTML documents: things like > title attributes store their data as UTF-8 strings, while page templates > insist on Unicode objects, resulting in errors all over the place. > > Those sites worked with Zope 2.9 and broke down after an upgrade to > 2.12. That's a not very nice thing to do to your users...
They broke down after the move to Zope 2.10. We switched Zope 2 to using the zope.tal / zope.tales packages in favor of Zope 2's own implementation. As a result TAL uses Unicode internally ever since. There's the whole unicoderesolver story, which allows you to implement an application specific fallback story. We decided back then, that dealing with this problem would be left to each application, as Zope 2 in general has too little knowledge about your data - and nobody volunteered to do any work on it ;) Plone has implemented a specific fallback story which automatically converts all utf-8 encoded strings to Unicode. In the Plone 3.x series it accepted all otherwise encoded strings and converted them via unicode(text, 'utf-8', 'ignore'), logging such occurrences. In Plone 4 it throws an exception on any non-utf-8 non unicode data. In Plone 5 we'll probably log warnings for utf-8 encoded strings and push the responsibility to convert to Unicode into the application code. If you have a rather large application with third-party plugins and have to deal with the encoded string to Unicode conversion, I think such a long term upgrade story with policy changes happening around major releases is the only way to go. If you have a pure-inhouse application you can do a data and code conversion as single project and get over with. That being said, I'd like to see someone tackle the "id" / url segments as Unicode problem. They are currently restricted to ASCII, which means we don't have a problem with arbitrary encoded string data. But there's probably enough places that rely on them being ASCII in some way. Hanno _______________________________________________ Zope-Dev maillist - Zope-Dev@zope.org https://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - https://mail.zope.org/mailman/listinfo/zope-announce https://mail.zope.org/mailman/listinfo/zope )