Hi,

googling through the web, I didn't find a solution for the following 
encoding problem using TurboGears/SQLAlchemy:

Environment: see below

The test data model is fairly simple:

class Project(ActiveMapper):
    "test model"
    class mapping:
        id = column(Integer, primary_key=True)
        title = column(Unicode(255))
        title_key = column(Unicode(30))

test data (edited via phpMyAdmin):
- title = ''The german ö is an o-umlaut"
- tite_key = 'test_key'

running from tg-admin shell:
>>> from model import Project
>>> proj = Project.select_by(title_key="testkey")
(...)
Traceback (most recent call last):
  File "<console>", line 1, in ?
  File "build/bdist.linux-i686/egg/sqlalchemy/ext/assignmapper.py",
     line 7, in do
(...)
  File "build/bdist.linux-i686/egg/sqlalchemy/engine/base.py",
     line 632, in _get_col
  File "build/bdist.linux-i686/egg/sqlalchemy/types.py",
     line 190, in convert_result_value
  File "encodings/utf_8.py", line 16, in decode
     UnicodeDecodeError: 'utf8' codec can't decode bytes
     in position 11-14: invalid data

Ok, why is this? Why a "decode", everything should be utf8?

Let's do a simple access to MySQL on raw console:

>>> import MySQLdb
>>> con = MySQLdb.connect(host="127.0.0.1", port=...etc.)
>>> cur = con.cursor()
>>> sql = "select title from project where title_key = 'testkey'";
>>> cur.execute(sql);
1L
>>> t = cur.fetchall()[0][0]
>>> con.close()
>>> t
'The german \xf6 is an o-umlaut'
>>> t.decode("utf8")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "encodings/utf_8.py", line 16, in decode
     UnicodeDecodeError: 'utf8' codec can't decode bytes
     in position 11-14: invalid data

The decode failed, because the string is not a "utf8-string" (although 
seemingly utf8 encoded):

>>> isinstance(t, unicode)
False

It should run this way:

>>> s = u'The german \xf6 is an o-umlaut'
>>> isinstance(s,unicode)
True
>>> print s
The german ö is an o-umlaut

But what is t if not an utf8-string like s?

>>> s.encode("latin1")
'The german \xf6 is an o-umlaut'
>>> s.encode("latin1") == t
True

It is a latin-1 encoded "utf8-string". What process encodes the already 
correct unicode-string wrongly using latin-1?

Is this a MySQLdb-failure?

Do I miss something?

My environment:
Linux Ubuntu 6.10
TurboGears 1.0b1
SQLAlchemy 0.3.0 (release) using ActiveMapper
MySQL 5.0.24a-Debian
- started by root: /usr/bin/mysqld_safe --character-set-server=utf8 &

mysql> SHOW VARIABLES LIKE 'character_set%';
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8                       |
| character_set_connection | utf8                       |
| character_set_database   | utf8                       |
| character_set_filesystem | binary                     |
| character_set_results    | utf8                       |
| character_set_server     | utf8                       |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+

mysql> SHOW VARIABLES LIKE 'collation%';
+----------------------+-----------------+
| Variable_name        | Value           |
+----------------------+-----------------+
| collation_connection | utf8_general_ci |
| collation_database   | utf8_general_ci |
| collation_server     | utf8_general_ci |
+----------------------+-----------------+

Any hints are welcome:-)

TIA,
Stefan

-- 
Start here: www.meretz.de

--~--~---------~--~----~------------~-------~--~----~
 You received this message because you are subscribed to the Google Groups 
"TurboGears" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/turbogears?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to