Re: [Zope-CMF] Charsets
Am 29.12.2008 um 15:01 schrieb Charlie Clark: The site should deliver all pages containing forms (if possible even all pages) with a single charset, let's call it the site charset. Then it uses this same charset to interpret form data. While I understand this, I'm a bit at a loss as to why this is happening. I'm using forms based on CMFDefault's formlib implementation. Charsets are set for the site and zpublisher but something else is probably missing. Delving deeper into this I think I understand things a little better. The accept-charset attribute on a form tag requires the browser to encode any form data in the specific encoding. Ideally this would make additional negotiation unnecessary but this value isn't passed to the server as the HTTP_ACCEPT_CHARSET which is where the fun starts. As has been noted previously, http://mail.zope.org/pipermail/zope3-dev/2004-June/011483.html , browsers don't all behave themselves when setting this header: IE 6 + 7 and Safari set an empty header whereas Opera and Firefox usually set something like iso-8859-1, utf-8, utf-16, *;q=0.1 getPreferredCharsets() will return 'iso-8859-1' where HTTP_ACCEPT_CHARSET is empty. But this will cause problems if the browser is actually using UTF-8. But the way the CMF uses getPreferredCharsets() is right either: CMFDefault.utils def getBrowserCharset(request): Get charset preferred by the browser. envadapter = IUserPreferredCharsets(request) charsets = envadapter.getPreferredCharsets() or ['utf-8'] return charsets[0] This will always be iso-8859-1 for Opera and Firefox because all charsets have the same quality, again even if UTF-8 encoding is specified. I haven't been able to track where the decoding of form data occurs for Zope 2 stuff but I can identify the problem in zpublisher.browser.BrowserRequest def _decode(self, text): Try to decode the text using one of the available charsets. if self.charsets is None: envadapter = IUserPreferredCharsets(self) self.charsets = envadapter.getPreferredCharsets() or ['utf-8'] for charset in self.charsets: try: text = unicode(text, charset) break except UnicodeError: pass return text Here the naive assumption is that we decode from a charset without an error then we have the correct charset. Sometimes this goes unnoticed but with characters like u2013 and u2014 (en-dash and em-dash) it will raise errors as those codepoints are not in the Latin-1 charset but it has it's own equivalents. I would suggest that we work towards enforcing UTF-8 in where possible but at the very least add the accept-charset attribute to forms and use the portal's default_charset for this. I'd very much appreciate your comments on this. Charlie -- Charlie Clark Helmholtzstr. 20 Düsseldorf D- 40215 Tel: +49-211-938-5360 GSM: +49-178-782-6226 ___ Zope-CMF maillist - Zope-CMF@lists.zope.org http://mail.zope.org/mailman/listinfo/zope-cmf See https://bugs.launchpad.net/zope-cmf/ for bug reports and feature requests
Re: [Zope-CMF] Charsets
Charlie Clark wrote at 2009-1-18 15:49 +0100: ... I would suggest that we work towards enforcing UTF-8 in where possible but at the very least add the accept-charset attribute to forms and use the portal's default_charset for this. I'd very much appreciate your comments on this. The Accept-Charset request header should *never* be used to guess a charset at the server side: Accept-Charset is a user preference which does not know anything about charsets used by the server. If utf-8 would not be treated with preference in the current code, the code base would see massive problems. Only the server knows which charsets it is using -- and it should use a single one (with very few exceptions). There should be a configuration option that tells this charset and this should be used to decode form data. -- Dieter ___ Zope-CMF maillist - Zope-CMF@lists.zope.org http://mail.zope.org/mailman/listinfo/zope-cmf See https://bugs.launchpad.net/zope-cmf/ for bug reports and feature requests
Re: [Zope-CMF] Charsets
Hi Charlie! Charlie Clark wrote: Am 29.12.2008 um 15:01 schrieb Charlie Clark: CMFDefault.utils def getBrowserCharset(request): Get charset preferred by the browser. envadapter = IUserPreferredCharsets(request) charsets = envadapter.getPreferredCharsets() or ['utf-8'] return charsets[0] This will always be iso-8859-1 for Opera and Firefox because all charsets have the same quality, again even if UTF-8 encoding is specified. getBrowserCharset does almost the same as zope.publisher.http.getCharsetUsingRequest. And it is only used for encoding and decoding 'portal_status_message'. It is not relevant for the issue you noticed. I haven't been able to track where the decoding of form data occurs for Zope 2 stuff but I can identify the problem in zpublisher.browser.BrowserRequest You mean zope.publisher.browser.BrowserRequest. The Zope 2 version is in Products.Five.browser.decode. def _decode(self, text): Try to decode the text using one of the available charsets. if self.charsets is None: envadapter = IUserPreferredCharsets(self) self.charsets = envadapter.getPreferredCharsets() or ['utf-8'] for charset in self.charsets: try: text = unicode(text, charset) break except UnicodeError: pass return text Here the naive assumption is that we decode from a charset without an error then we have the correct charset. Sometimes this goes unnoticed but with characters like u2013 and u2014 (en-dash and em-dash) it will raise errors as those codepoints are not in the Latin-1 charset but it has it's own equivalents. AFAICS the fallback to other charsets is usually not required in Zope 3. If the publisher encodes responses using zope.publisher.http.getCharsetUsingRequest, the first charset will be the right one. I would suggest that we work towards enforcing UTF-8 in where possible but at the very least add the accept-charset attribute to forms and use the portal's default_charset for this. I'd very much appreciate your comments on this. I can't see a need to implement this in a different way than Zope 3. So I propose to fix the encoding of forms sent to the browser. Cheers, Yuppie ___ Zope-CMF maillist - Zope-CMF@lists.zope.org http://mail.zope.org/mailman/listinfo/zope-cmf See https://bugs.launchpad.net/zope-cmf/ for bug reports and feature requests