Re: [Sqlalchemy-users] question about convert_unicode

Vasily Sulatskov Wed, 19 Apr 2006 03:48:02 -0700

Hello Qvx, 

As far as I understand your proposal will always convert String objects to 
regular python strings when recieving data from database. But documentation 
says:


convert_unicode=False : if set to True, all String/character based types will 
convert Unicode values to raw byte values going into the database, and all 
raw byte values to Python Unicode coming out in result sets. This is an 
engine-wide method to provide unicode across the board. For unicode 
conversion on a column-by-column level, use the Unicode column type instead.

I understand that paragraph that way: If you set convert_unicode to True then 
all strings will get converted to unicode on it's way out of database. And 
unicode columns is supposed to be used for "column-by-column level" unicode 
control.

Patch that I suggested was supposed to work like documentation say. But your 
patch changes that behaviour.

Perhap's documentation should be changed to work another way.

There's another aspect of this problem: table autoload option. I don't know 
will it generate Unicode type columns or String type columns. If it will 
geenrate Unicode type columns then everything fine, but if not, that will be 
bad because you will always get string objects from autoloaded tables.

But again, I am not sure about table autoloading I just feel some uncertainity 
in this area.

So perhaps we can discuss this problem and find solution that will satisfy all 
users of sqlalchemy.

> I didn't look at your patch. I just gave a few general observations.
>
> I'm not sure that I would set 'ascii' as default value. I would set it to
> None (meening "avoid using it" or "inherit value from encoding param").
>
> I guess that flag called "client_encoding" could make things work more
> explicitely in SA if you *must* use plain strings instead of unicode. But
> after looking at types.py I'm not sure that String class is correct, and
> adding client_encoding into the mix makes it even more obscure. Although,
> it has a potential of actually making it better.
>
> My observations of types.py by looking at code:
>
> Unicode:
>   - good
>   - unicode on client side (bind params and column values),
>   - explicit conversion to encoded string when talking to engine
>
> String:
>   - strange beast
>   - it can be unicode as well as string on client side (bind params and
> column values) depending on convert_unicode param
>   - it uses both unicode and strings when talking to engine depending on
> convert_unicode param
>   - or, in other words: pass unchanged data (be it unicode or string) if
> there is no convert_unicode param
>
> Your additions could make it into a better thing if done differently:
>
> String:
>   - string on client side (bind params and column values), no unicode in
> sight
>   - talk to database in expected encoding
>   - use encoding / client_encoding pair to do conversions between client /
> db side
>   - remove convert_unicode param (If you want to use unicode there is
> Unicode class)
>
> I'm not sure what else would break, or what other use case I'm braking with
> this proposal, but the current String (with or without your additions)
> leaves a bad taste in my mouth.
>
> I would do it like this (not tested):
>
> Index: lib/sqlalchemy/types.py
> ===================================================================
> --- lib/sqlalchemy/types.py    (revision 1294)
> +++ lib/sqlalchemy/types.py    (working copy)
> @@ -96,15 +96,24 @@
>      def get_constructor_args(self):
>          return {'length':self.length}
>      def convert_bind_param(self, value, engine):
> -        if not engine.convert_unicode or value is None or not
> isinstance(value, unicode):
> +        if value is None:
> +            return None
> +        elif isinstance(value, unicode):
> +            return value.encode(engine.encoding)
> +            # or even raise exception (but I wouldn't go that far)
> +        elif engine.client_encoding != engine.encoding:
> +            return unicode(value, engine.client_encoding).encode(
> engine.encoding)
> +        else:
>              return value
> +    def convert_result_value(self, value, engine):
> +        if value is None:
> +            return None
> +        elif isinstance(value, unicode):
> +            return value.encode(engine.client_encoding)
> +        elif engine.client_encoding != engine.encoding:
> +            return unicode(value, engine.encoding).encode(
> engine.client_encoding)
>          else:
> -            return value.encode(engine.encoding)
> -    def convert_result_value(self, value, engine):
> -        if not engine.convert_unicode or value is None or
> isinstance(value, unicode):
>              return value
> -        else:
> -            return value.decode(engine.encoding)
>      def adapt_args(self):
>          if self.length is None:
>              return TEXT()
> Index: lib/sqlalchemy/engine.py
> ===================================================================
> --- lib/sqlalchemy/engine.py    (revision 1294)
> +++ lib/sqlalchemy/engine.py    (working copy)
> @@ -227,7 +227,7 @@
>      SQLEngines are constructed via the create_engine() function inside
> this package.
>      """
>
> -    def __init__(self, pool=None, echo=False, logger=None,
> default_ordering=False, echo_pool=False, echo_uow=False,
> convert_unicode=False, encoding='utf-8', **params):
> +    def __init__(self, pool=None, echo=False, logger=None,
> default_ordering=False, echo_pool=False, echo_uow=False, encoding='utf-8',
> client_encoding=None, **params):
>          """constructs a new SQLEngine.   SQLEngines should be constructed
> via the create_engine()
>          function which will construct the appropriate subclass of
> SQLEngine."""
>          # get a handle on the connection pool via the connect arguments
> @@ -246,8 +246,8 @@
>          self.default_ordering=default_ordering
>          self.echo = echo
>          self.echo_uow = echo_uow
> -        self.convert_unicode = convert_unicode
>          self.encoding = encoding
> +        self.client_encoding = client_encoding or encoding
>          self.context = util.ThreadLocal()
>          self._ischema = None
>          self._figure_paramstyle()
>
>
> Kind regards,
> Tvrtko
>
> On 4/19/06, Vasily Sulatskov <[EMAIL PROTECTED]> wrote:
> > Hello Qvx,
> >
> > Well, perhaps you are right. But let's then define what the "right way"
> > is.
> >
> > Second version of patch that I submitted included default value "ascii"
> > for
> > new engine parameter "client_encoding" it works in the following way: If
> > user
> > specifies conver_unicode=True, and doesn't specify client_encoding it
> > will be
> > ascii, and new types.Sring will try to convert regular strings to unicode
> > using specifed client_encoding if it unable to convert to unicode it will
> > produce exception during construction of unicode object.
> >
> > That guarantee's that any string going to database will get converted to
> > proper encoding. But I dont't say that it's the best or even "right way".
> >
> > I also think that the more strictly you enforce unicode usage the better,
> > but
> > unfortunately there are many places in python where regular string is
> > used (like str() function e.t.c) so for some time we have to live with
> > regular strings.
> >
> > What do you think how it should be in sqlalchemy?
> >
> > > I'm also the unfortunate one who has to use encodings other than ascii.
> >
> > I'm
> >
> > > sure that your patch helps, but I'm not sure that this is the "right
> >
> > way".
> >
> > > The thing that I learned from my dealing with unicode and string
> >
> > encodings
> >
> > > is: always use unicode. What I mean is when you write your source:
> > > * make all your data (variables, literals) as unicode
> > > * put the -*- coding: -*- directive so that interpreter knows how to
> > > convert your u"" strings
> > >
> > > Those two rules lead to the following:
> > >
> > > # -*- coding: cp1251 -*-
> > >
> > > import sqlalchemy
> > >
> > > # note that there is no convert_unicode flag, but there is encoding
> > > flag db = sqlalchemy.create_engine('sqlite://', encoding='cp1251')
> > >
> > > # note a change in type of "name" column from String to Unicode
> > > companies = sqlalchemy.Table('companies', db,
> > >    sqlalchemy.Column('company_id', sqlalchemy.Integer,
> >
> > primary_key=True),
> >
> > >    sqlalchemy.Column('name', sqlalchemy.Unicode(50)))
> > >
> > > # ....
> > >
> > > # OK, unicode
> > > Company(name=u'Какой-то текст в кодировке cp1251')
> > >
> > > # Avoid plain strings
> > > Company(name='Some text in ascii')
> > >
> > >
> > > This becomes necessity if you have for example more than one database
> > > driver using different encoding. You get back unicode strings which you
> >
> > can
> >
> > > combine and copy from one database to another without worrying.
> > >
> > > db1 = sqlalchemy.create_engine('mysql://', encoding='latin2')
> > > db2 = sqlalchemy.create_engine('oracle://', encoding='windows-1250')
> > >
> > > ob1 = db1_mapper.select(...)
> > > ob2 = db2_mapper.select(...)
> > >
> > > ob1.name = ob1.name + ob2.name # All unicode, no problems
> > >
> > > On 4/17/06, Vasily Sulatskov <[EMAIL PROTECTED]> wrote:
> > > > Hello Michael,
> > > >
> > > > I  know  there's  a  database  engine  parameter  "encoding". It
> > > > tells sqlalchemy  in  which  encoding  Unicode  objects  should  be
> > > > saved to database.
> > > >
> > > > I  suggest  adding another encoding, let's say "client_encoding"
> > > > which will  be  used  when  convert_unicode  is True and user assigns
> > > > string object  to  object attribute. Currently even if
> > > > convert_unicode is set to True string go to database as-is, bypassing
> > > > convertion to unicode.
> > > >
> > > > This  option  will  allow  to  assign  string's  in 
> > > > national/platform specific  encodings, like cp1251 straigt to object
> > > > attributes and they will be properly converted to database encoding
> > > > (engine.encoding).
> > > >
> > > >
> > > > See,  encoding  on  client  machine  may be different from encoding
> > > > in database. You can see changes that I suggest from attached diff.
> > > >
> > > > Suggested    changes    will    can    make    life    of   users  
> > > > of multilingual/multienconding  enviromnents  a  little  easier while
> > > > not affexcting all other users of SQLAlchemy.
> > > >
> > > > MB> On Apr 17, 2006, at 5:47 AM, Vasily Sulatskov wrote:
> > > > >> In my opinion that's a bug and that behaviour should be changed to
> > > > >> something
> > > > >> like that:
> > > > >> 1. If object is unicode then convert it to engine specified
> > > > >> encoding (like
> > > > >> utf8) as it happens now
> > > > >> 2. If it's a string then convert it to unicode using some another
> > > > >> specifed
> > > > >> encoding (it should be added to engine parameters). This encoding
> > > > >> specifies
> > > > >> client-side encoding. It's often handy to have different encodings
> > > > >> in database
> > > > >> and on client machines (at least for people with "alternate
> > > > >> languages" :-)
> > > >
> > > > MB> there already is an encoding parameter for the engine.
> > > >
> > > > MB> http://www.sqlalchemy.org/docs/dbengine.myt#database_options
> > > >
> > > > MB> does that solve your problem ?
> > > >
> > > > --
> > > > Best regards,
> > > > Vasily                            mailto:[EMAIL PROTECTED]


-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642
_______________________________________________
Sqlalchemy-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/sqlalchemy-users

Re: [Sqlalchemy-users] question about convert_unicode

Reply via email to