On 7/22/2011 3:25 PM, Remy Blank wrote:
I seem to be unable to extract l10n messages for JavaScript on current
trunk:

$ python setup.py extract_messages_js
...
extracting messages from trac/admin/templates/admin_plugins.html
Traceback (most recent call last):
  ...
line 462, in extract_javascript
     for token in tokenize(fileobj.read().decode(encoding)):
   File "/home/joe/src/trac/trunk/venv/lib/python2.6/encodings/utf_8.py",
line 16, in decode
     return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in
position 617: ordinal not in range(128)

Does anybody have an idea what this could be due to?

Looks like the fileobj.read() above returns a snippet which is already a unicode string, and in the case of this admin_plugins.html file this snippet contains:

   a.text("\u2013")

as a replacement for:

  a.text("–")

I haven't looked at the exact mechanism in Babel, but at least it seems very wrong to decode an unicode string... as this goes through an intermediate encode step using 'ascii' (see http://trac.edgewall.org/wiki/UnicodeEncodeError#decode for the gory details).

Following fix suggested for Babel:

--- ../../dependencies/babel-0.9.x/babel/messages/extract.py Wed Mar 17 09:51:01 2010 +++ C:\Dev\Python265\lib\site-packages\babel-0.9.5-py2.6.egg\babel\messages\extract.py Fri Jul 22 16:16:26 2011
@@ -455,11 +455,14 @@
     last_argument = None
     translator_comments = []
     concatenate_next = False
-    encoding = options.get('encoding', 'utf-8')
     last_token = None
     call_stack = -1

-    for token in tokenize(fileobj.read().decode(encoding)):
+    data = fileobj.read()
+    if isinstance(data, str):
+        encoding = options.get('encoding', 'utf-8')
+        data = data.decode(encoding)
+    for token in tokenize(data):
         if token.type == 'operator' and token.value == '(':
             if funcname:
                 message_lineno = token.lineno



On our side, it seems that if we preemptively use "\u2013" in the Javascript instead of "&ndash;", we avoid the issue and things nevertheless work as before... well, from a quick test and look at the code it seems that this part of the js code is never used as td.find(".trac-toggler a") can't retrieve anything (no .trac-toggler below a <td> element in the template).

-- Christian

--
You received this message because you are subscribed to the Google Groups "Trac 
Development" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/trac-dev?hl=en.

Reply via email to