Hi,
googling through the web, I didn't find a solution for the following
encoding problem using TurboGears/SQLAlchemy:
Environment: see below
The test data model is fairly simple:
class Project(ActiveMapper):
"test model"
class mapping:
id = column(Integer, primary_key=True)
title = column(Unicode(255))
title_key = column(Unicode(30))
test data (edited via phpMyAdmin):
- title = ''The german ö is an o-umlaut"
- tite_key = 'test_key'
running from tg-admin shell:
>>> from model import Project
>>> proj = Project.select_by(title_key="testkey")
(...)
Traceback (most recent call last):
File "<console>", line 1, in ?
File "build/bdist.linux-i686/egg/sqlalchemy/ext/assignmapper.py",
line 7, in do
(...)
File "build/bdist.linux-i686/egg/sqlalchemy/engine/base.py",
line 632, in _get_col
File "build/bdist.linux-i686/egg/sqlalchemy/types.py",
line 190, in convert_result_value
File "encodings/utf_8.py", line 16, in decode
UnicodeDecodeError: 'utf8' codec can't decode bytes
in position 11-14: invalid data
Ok, why is this? Why a "decode", everything should be utf8?
Let's do a simple access to MySQL on raw console:
>>> import MySQLdb
>>> con = MySQLdb.connect(host="127.0.0.1", port=...etc.)
>>> cur = con.cursor()
>>> sql = "select title from project where title_key = 'testkey'";
>>> cur.execute(sql);
1L
>>> t = cur.fetchall()[0][0]
>>> con.close()
>>> t
'The german \xf6 is an o-umlaut'
>>> t.decode("utf8")
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "encodings/utf_8.py", line 16, in decode
UnicodeDecodeError: 'utf8' codec can't decode bytes
in position 11-14: invalid data
The decode failed, because the string is not a "utf8-string" (although
seemingly utf8 encoded):
>>> isinstance(t, unicode)
False
It should run this way:
>>> s = u'The german \xf6 is an o-umlaut'
>>> isinstance(s,unicode)
True
>>> print s
The german ö is an o-umlaut
But what is t if not an utf8-string like s?
>>> s.encode("latin1")
'The german \xf6 is an o-umlaut'
>>> s.encode("latin1") == t
True
It is a latin-1 encoded "utf8-string". What process encodes the already
correct unicode-string wrongly using latin-1?
Is this a MySQLdb-failure?
Do I miss something?
My environment:
Linux Ubuntu 6.10
TurboGears 1.0b1
SQLAlchemy 0.3.0 (release) using ActiveMapper
MySQL 5.0.24a-Debian
- started by root: /usr/bin/mysqld_safe --character-set-server=utf8 &
mysql> SHOW VARIABLES LIKE 'character_set%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
mysql> SHOW VARIABLES LIKE 'collation%';
+----------------------+-----------------+
| Variable_name | Value |
+----------------------+-----------------+
| collation_connection | utf8_general_ci |
| collation_database | utf8_general_ci |
| collation_server | utf8_general_ci |
+----------------------+-----------------+
Any hints are welcome:-)
TIA,
Stefan
--
Start here: www.meretz.de
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"TurboGears" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at
http://groups.google.com/group/turbogears?hl=en
-~----------~----~----~----~------~----~------~--~---