[Zope-CMF] Re: [RFC] [Patch] GenericSetup and encodings
yuppie wrote: As you already mentioned setting default-zpublisher-encoding to 'utf-8' doesn't really work. Just found that DT_Util.join_unicode has 'latin-1' hardcoded, so properties with other encodings are not supported by manage_propertiesForm. Given that I don't think we have to support other default_zpublisher_encodings than 'latin-1'. As AJ answered me (http://article.gmane.org/gmane.comp.web.zope.devel/11655), Unicode properties should use the u- types (ustring, utext). So the way to proceed could be: * document that only iso-8859-1 \inter default_encoding string/text properties are supported * ensure that the unicode types work (e.g., TarballExportContext.writeDataFile don't accept unicode text) * change GenericSetup users (CMF, CPS) to use u* when needed (e.g. title, description). Yep, sure :-) Cheers, Yuppie yves ___ Zope-CMF maillist - Zope-CMF@lists.zope.org http://mail.zope.org/mailman/listinfo/zope-cmf See http://collector.zope.org/CMF for bug reports and feature requests
[Zope-CMF] Re: [RFC] [Patch] GenericSetup and encodings
yuppie wrote: As you already mentioned setting default-zpublisher-encoding to 'utf-8' doesn't really work. Just found that DT_Util.join_unicode has 'latin-1' hardcoded, so properties with other encodings are not supported by manage_propertiesForm. I'm just about to send a mail to zope.devel about this :-) Given that I don't think we have to support other default_zpublisher_encodings than 'latin-1'. Cheers, Yuppie yves ___ Zope-CMF maillist - Zope-CMF@lists.zope.org http://mail.zope.org/mailman/listinfo/zope-cmf See http://collector.zope.org/CMF for bug reports and feature requests
[Zope-CMF] Re: [RFC] [Patch] GenericSetup and encodings
yuppie wrote: Yves Bastide wrote: converter is field2string, default_encoding is ZPublisher.HTTPRequest.default_encoding = 'iso-8859-15' (not to mistake for ZPublisher.Converters.default_encoding = 'iso-8859-15'). These default_encoding's are set by Zope2.Startup.datatypes.default_zpublisher_encoding (i.e. zope.conf's default-zpublisher-encoding directive). So GenericSetup has to use ZPublisher.HTTPRequest.default_encoding as well. Right? I think so. Or ZPublisher.Converters.default_encoding whose name may be more explicit (default_zpublisher_encoding sets {Converters,HTTPRequest,HTTPResponse}.default_encoding) If CMF is messing around with other encodings (like using the site's default_charset for the portal titel) it has to override that. If using utf-8 was the wrong approach your test_utils patch has to be modified as well. It first needs to fail in the current setup ... Cheers, Yuppie Thanks, yves ___ Zope-CMF maillist - Zope-CMF@lists.zope.org http://mail.zope.org/mailman/listinfo/zope-cmf See http://collector.zope.org/CMF for bug reports and feature requests
[Zope-CMF] Re: [RFC] [Patch] GenericSetup and encodings
Replying to myself... Yves Bastide wrote: I know of at least one point, ZPublisher.Converters (field2string). However by the time a supposedly unicode string (say title:UTF-8:string) comes here, it's already iso8859. Will look deeper ... Well, should have looked up in the call stack. ZPublisher.HTTPRequest.processInputs, lines 527sq (Zope trunk): 527: item = unicode(item,character_encoding) 528: if hasattr(converter,'convert_unicode'): 529: item = converter.convert_unicode(item) 530: else: 531: item = converter(item.encode(default_encoding)) converter is field2string, default_encoding is ZPublisher.HTTPRequest.default_encoding = 'iso-8859-15' (not to mistake for ZPublisher.Converters.default_encoding = 'iso-8859-15'). These default_encoding's are set by Zope2.Startup.datatypes.default_zpublisher_encoding (i.e. zope.conf's default-zpublisher-encoding directive). Of course, just setting default-zpublisher-encoding to utf-8 results in a garbled ZMI ... yves ___ Zope-CMF maillist - Zope-CMF@lists.zope.org http://mail.zope.org/mailman/listinfo/zope-cmf See http://collector.zope.org/CMF for bug reports and feature requests
[Zope-CMF] Re: [RFC] [Patch] GenericSetup and encodings
yuppie wrote: Hi! [...] With this applied, Portàl (u'Port\xe0l'), which becomes 'Port\xc3\xa0l', is displayed as Portà l ... Zope does input--output properties in utf-8, but stores them in iso8859. Sigh. I was afraid this would be complex :( That's why I only use ASCII in configuration data. Can you find out why it stores them in iso8859? Is this hardcoded or configurable somewhere? I know of at least one point, ZPublisher.Converters (field2string). However by the time a supposedly unicode string (say title:UTF-8:string) comes here, it's already iso8859. Will look deeper ... [About getEncoding()] Don't know if third party products use it. I guess if CPS doesn't nobody does. It does, though I suspect incorrectly. Florent? AFAICS it could be deprecated at least for export contexts. Well, I think I can wriggle out of most of my problems using translation. And I'll try and write UTF-8 unit tests if nobody beats me to it. That would be great. Hmm, by adding to an existing test suite, or creating a new one? In general the unit tests have a module / class structure similar to the tested units. E.g. tests for utils.PropertyManagerHelpers should be added to test_utils.PropertyManagerHelpersTests. But sometimes there are reasons to add a new test suite, e.g. if you need a different setup. I did modify test_utils's properties suite (see attached patch), but it passes with GenericSetup current version :-) Cheers, Yuppie yves Index: GenericSetup/tests/test_utils.py === --- GenericSetup/tests/test_utils.py (revision 68520) +++ GenericSetup/tests/test_utils.py (working copy) @@ -24,7 +24,7 @@ from Products.GenericSetup.testing import DummySetupEnviron -_EMPTY_PROPERTY_EXPORT = """\ +_EMPTY_PROPERTY_EXPORT = u"""\ False @@ -34,6 +34,7 @@ 0 + 0.0 False -""" +""".encode('utf-8') -_NORMAL_PROPERTY_EXPORT = """\ +_NORMAL_PROPERTY_EXPORT = u"""\ True @@ -60,6 +61,7 @@ 1 Foo String + \u0080 Foo Text @@ -78,9 +80,9 @@ 3.1415 True -""" +""".encode('utf-8') -_FIXED_PROPERTY_EXPORT = """\ +_FIXED_PROPERTY_EXPORT = u"""\ True @@ -93,6 +95,7 @@ 1 Foo String + \u0080 Foo Text @@ -109,7 +112,7 @@ 3.1415 True -""" +""".encode('utf-8') _SPECIAL_IMPORT = """\ @@ -240,6 +243,7 @@ obj.manage_addProperty('foo_lines', '', 'lines') obj.manage_addProperty('foo_long', '0', 'long') obj.manage_addProperty('foo_string', '', 'string') +obj.manage_addProperty('foo_unicode_string', '', 'string') obj.manage_addProperty('foo_text', '', 'text') obj.manage_addProperty('foo_tokens', '', 'tokens') obj.manage_addProperty('foo_selection', 'foobarbaz', 'selection') @@ -264,6 +268,7 @@ obj._updateProperty('foo_lines', 'Foo\nLines') obj._updateProperty('foo_long', '1') obj._updateProperty('foo_string', 'Foo String') +obj._updateProperty('foo_unicode_string', u'\u0080'.encode('utf-8')) obj._updateProperty('foo_text', 'Foo\nText') obj._updateProperty( 'foo_tokens', ('Foo', 'Tokens') ) obj._updateProperty('foo_selection', 'Foo') @@ -303,6 +308,7 @@ self.assertEqual(getattr(obj, 'foo_lines', None), None) self.assertEqual(getattr(obj, 'foo_long', None), None) self.assertEqual(getattr(obj, 'foo_string', None), None) +self.assertEqual(getattr(obj, 'foo_unicode_string', None), None) self.assertEqual(getattr(obj, 'foo_text', None), None) self.assertEqual(getattr(obj, 'foo_tokens', None), None) self.assertEqual(getattr(obj, 'foo_selection', None), None) ___ Zope-CMF maillist - Zope-CMF@lists.zope.org http://mail.zope.org/mailman/listinfo/zope-cmf See http://collector.zope.org/CMF for bug reports and feature requests
[Zope-CMF] Re: [RFC] [Patch] GenericSetup and encodings
yuppie wrote: Hi! Yves Bastide wrote: yuppie wrote: 3.) GenericSetup is not tested with non-ASCII UTF-8 site settings. AFAIK import works, but not export. I consider this a bug. [...] UnicodeDecodeError: 'ascii' codec can't decode byte 0xf4 in position 20: ordinal not in range(128) This traceback just confirms that export does not work. Is import also broken? Differently: it may or may not raise ... And Zope treats properties as iso8859-15 anyway. Fresh install of Zope trunk (after a long struggle; make instance now works but make install is broken?) and CMF trunk, with ~/src/CMF$ svn diff Index: CMFDefault/profiles/default/properties.xml === --- CMFDefault/profiles/default/properties.xml (revision 68514) +++ CMFDefault/profiles/default/properties.xml (working copy) @@ -1,6 +1,6 @@ - Portal + Portàl [EMAIL PROTECTED] Fails when CMFDefault.factory.addConfiguredSite calls createSnapshot. Here's a minimal patch for GenericSetup not to raise on the previous case (Demonstration product. Not for sale.) [EMAIL PROTECTED]:~/src/CMF$ svn diff GenericSetup/ Index: GenericSetup/context.py === --- GenericSetup/context.py (revision 68514) +++ GenericSetup/context.py (working copy) @@ -475,7 +475,7 @@ if isinstance( body, unicode ): encoding = self.getEncoding() if encoding is None: -body = body.encode() +body = body.encode('UTF-8') else: body = body.encode( encoding ) Index: GenericSetup/utils.py === --- GenericSetup/utils.py (revision 68514) +++ GenericSetup/utils.py (working copy) @@ -625,6 +625,8 @@ else: if prop_map.get('type') == 'boolean': prop = str(bool(prop)) +elif isinstance(prop, str): +prop = prop.decode('UTF-8') elif not isinstance(prop, basestring): prop = str(prop) child = self._doc.createTextNode(prop) [EMAIL PROTECTED]:~/src/CMF$ With this applied, Portàl (u'Port\xe0l'), which becomes 'Port\xc3\xa0l', is displayed as Portà l ... Zope does input--output properties in utf-8, but stores them in iso8859. Sigh. Thanks for setting me right. What's the usefulness of getEncoding()? As you say, exported files don't need to be other than utf-8 encoded. I guess it just exists for historical reasons. Might it be removed, or default'ed to utf-8? Do people already rely on it? Well, I think I can wriggle out of most of my problems using translation. And I'll try and write UTF-8 unit tests if nobody beats me to it. That would be great. Hmm, by adding to an existing test suite, or creating a new one? Cheers, Yuppie Thanks, yves ___ Zope-CMF maillist - Zope-CMF@lists.zope.org http://mail.zope.org/mailman/listinfo/zope-cmf See http://collector.zope.org/CMF for bug reports and feature requests
[Zope-CMF] Re: zLOG -> logging
Florent Guillaume wrote: Yves Bastide wrote: And CPS 3.3 and 3.4 have been using Zope 2.9 since its inception, and is the only recommended "stable" platform for them. Though they also flood the console with zLOG deprecation warnings ;-) (I incrementally patched the worst offenders on my local copy, but never sent the result mainly because of the BLATHER/TRACE/DEBUG-to-debug impedance mismatch) Well you're better off patching zLOG then, to make it not send the warning :) Too easy :-) (And I just saw ZODB.loglevels.{TRACE,BLATHER}. /me go hiding) Florent yves ___ Zope-CMF maillist - Zope-CMF@lists.zope.org http://mail.zope.org/mailman/listinfo/zope-cmf See http://collector.zope.org/CMF for bug reports and feature requests
[Zope-CMF] Re: zLOG -> logging
Florent Guillaume wrote: And CPS 3.3 and 3.4 have been using Zope 2.9 since its inception, and is the only recommended "stable" platform for them. Though they also flood the console with zLOG deprecation warnings ;-) (I incrementally patched the worst offenders on my local copy, but never sent the result mainly because of the BLATHER/TRACE/DEBUG-to-debug impedance mismatch) Florent yves ___ Zope-CMF maillist - Zope-CMF@lists.zope.org http://mail.zope.org/mailman/listinfo/zope-cmf See http://collector.zope.org/CMF for bug reports and feature requests
[Zope-CMF] Re: [RFC] [Patch] GenericSetup and encodings
yuppie wrote: Hi Yves! Yves Bastide wrote: GenericSetup has problems handling non-ASCII data. 1.) GenericSetup explicitly doesn't support non-UTF-8 XML in profiles. UTF-8 is the default encoding for XML and I can't see a need to support other XML encodings. As output, right? Agreed. 2.) GenericSetup explicitly doesn't support non-UTF-8 site settings. If someone provides a good patch this feature can be added. But with the problems you mention later ('default_charset', 'management_page_charset', and so on), how would you envision it? 3.) GenericSetup is not tested with non-ASCII UTF-8 site settings. AFAIK import works, but not export. I consider this a bug. Neither: CMF trunk, change portal_types/Document's title to 'Dôcument', export: Traceback (innermost last): Module ZPublisher.Publish, line 115, in publish Module ZPublisher.mapply, line 88, in mapply Module ZPublisher.Publish, line 41, in call_object Module Products.GenericSetup.tool, line 471, in manage_exportAllSteps Module Products.GenericSetup.tool, line 272, in runAllExportSteps Module Products.GenericSetup.tool, line 736, in _doRunExportSteps Module Products.CMFCore.exportimport.typeinfo, line 198, in exportTypesTool Module Products.GenericSetup.utils, line 728, in exportObjects Module Products.GenericSetup.utils, line 722, in exportObjects Module Products.GenericSetup.utils, line 501, in _exportBody Module xml.dom.minidom, line 62, in toprettyxml Module StringIO, line 271, in getvalue UnicodeDecodeError: 'ascii' codec can't decode byte 0xf4 in position 20: ordinal not in range(128) It treats strings sometimes as ASCII, sometimes as UTF-8, yet it has access to two variables: its own ISetupContext.getEncoding() (whose use I didn't fully grok) and CMF's ISetupContext.getSite().getProperty('default_charset'). Sorry, but your assumptions are wrong: - The default setup tool creates export contexts without specifying the encoding, so ISetupContext.getEncoding() returns always None. And even if it would be set it represents the encoding of the exported files, not the site encoding. - getSite().getProperty('default_charset') is CMF specific and should not be used in GenericSetup. - The adapters adapt ISetupEnviron, not ISetupContext. getEncoding() and getSite() are not always available. Thanks for setting me right. What's the usefulness of getEncoding()? As you say, exported files don't need to be other than utf-8 encoded. First of all we need unit tests that make sure UTF-8 works and I think this should be the default used by GenericSetup. Code that needs to know how to find the site encoding can't be generic. Yep. There is an additional problem: If tools use the default property edit page from OFS the properties might have a different encoding than 'default_charset' of the site. Since the default 'management_page_charset' is UTF-8 we have less trouble if we allow only UTF-8. D'oh! /manage is 8859-15, /manage_menu is -1 and manage_propertiesForm UTF-8. No wonder Firefox sometimes gets confused :-) Well, I think I can wriggle out of most of my problems using translation. And I'll try and write UTF-8 unit tests if nobody beats me to it. Thanks! Cheers, Yuppie yves ___ Zope-CMF maillist - Zope-CMF@lists.zope.org http://mail.zope.org/mailman/listinfo/zope-cmf See http://collector.zope.org/CMF for bug reports and feature requests
[Zope-CMF] [RFC] [Patch] GenericSetup and encodings
Hi, GenericSetup has problems handling non-ASCII data. It treats strings sometimes as ASCII, sometimes as UTF-8, yet it has access to two variables: its own ISetupContext.getEncoding() (whose use I didn't fully grok) and CMF's ISetupContext.getSite().getProperty('default_charset'). Attached is a patch using both of them and somewhat working in my setup. Can knowledgeable people comment on it before I enter a collector issue? (I'm using GS alongside with CPS, which also needs some patching; yet basic things, such as exporting-importing an iso8859-15 Title in a CMF charset-default'ed to iso8859-15, should work) Thanks! Yves Index: GenericSetup/utils.py === --- GenericSetup/utils.py (revision 68510) +++ GenericSetup/utils.py (working copy) @@ -498,7 +498,8 @@ """Export the object as a file body. """ self._doc.appendChild(self._exportNode()) -return self._doc.toprettyxml(' ') +encoding = self.environ.getEncoding() or 'UTF-8' +return self._doc.toprettyxml(' ', encoding=encoding) def _importBody(self, body): """Import the object from the file body. @@ -617,6 +618,7 @@ node.setAttribute('name', prop_id) prop = self.context.getProperty(prop_id) +encoding = self.environ.getSite().getProperty('default_charset', '') or 'UTF-8' if isinstance(prop, (tuple, list)): for value in prop: child = self._doc.createElement('element') @@ -625,8 +627,10 @@ else: if prop_map.get('type') == 'boolean': prop = str(bool(prop)) +elif isinstance(prop, str): +prop = prop.decode(encoding) elif not isinstance(prop, basestring): -prop = str(prop) +prop = unicode(prop) child = self._doc.createTextNode(prop) node.appendChild(child) @@ -685,9 +689,10 @@ raise BadRequest('%s cannot be changed' % prop_id) elements = [] +encoding = self.environ.getEncoding() for sub in child.childNodes: if sub.nodeName == 'element': -elements.append(sub.getAttribute('value').encode('utf-8')) +elements.append(sub.getAttribute('value').encode(encoding)) if elements or prop_map.get('type') == 'multiple selection': prop_value = tuple(elements) or () @@ -696,7 +701,7 @@ else: # if we pass a *string* to _updateProperty, all other values # are converted to the right type -prop_value = self._getNodeText(child).encode('utf-8') +prop_value = self._getNodeText(child).encode(encoding) if not self._convertToBoolean(child.getAttribute('purge') or 'True'): ___ Zope-CMF maillist - Zope-CMF@lists.zope.org http://mail.zope.org/mailman/listinfo/zope-cmf See http://collector.zope.org/CMF for bug reports and feature requests