Re: [sqlalchemy] Copying instances from old to new schema DB

Michael Bayer Thu, 09 Sep 2010 09:54:17 -0700

On Sep 9, 2010, at 11:31 AM, chaouche yacine wrote:

> Thank you Michael, that's for the catch-the-error part. How about the 
> set-relations-right part, if I decided to go with my fix-as-you go recipe ? 
> do you have any idea ? the dictionary approach seems good, but I still am 
> curious about how to set relations generically on models. I think I'll use it 
> somewhere else in my code.


You don't need to set any relations.   If your new B has a fully populated 
primary key, you should just be using session.merge() for the whole thing.    
This can work with your "integrity error" scheme, or more simply with the 
"select first" schemes below:


for row in csv:
        new_object = make_an_object_from_csv_row(row)
        new_object = session.merge(new_object)
session.commit()


if new_object does *not* have the correct primary key, then:

for row in csv:
        new_object = make_an_object_from_csv_row(row)
        existing_object = 
session.query(cls).filter(cls.the_unique_column==new_object.the_unique_column).first()
        if existing_object is not None:
                new_object.id = existing_object.id
        new_object = session.merge(new_object)
session.commit()


if you don't like the many indvidual SELECT statements, then

# load a dictionary of (unique attr, primary key)
lookup = dict(
        session.query(cls.the_unique_column, cls.id)
)
for row in csv:
        new_object = make_an_object_from_csv_row(row)
        if new_object.the_unique_column in lookup:
            new_object.id = lookup[new_object.the_unique_column]
        new_object = session.merge(new_object)
session.commit()

or if most of your rows are replacement rows, quicker to preload everything:


# load a dictionary of (unique attr, instance)
lookup = dict(
        session.query(cls.the_unique_column, cls)
)
for row in csv:
        new_object = make_an_object_from_csv_row(row)
        if new_object.the_unique_column in lookup:
            new_object.id = lookup[new_object.the_unique_column].id
        new_object = session.merge(new_object)
session.commit()





> 
> By the way, I think the pseudo should have been : 
> 
> instance = next_instance_from_csv()
> try:
>    session.commit()
> except IntegrityError,e :
>    session.rollback()
>    if e.orig == UniqueConstraintError : 
>        original_instance = ModelClass.get(instance.id)
>        for relation_name in instance.get_relation_names() :
>            # Is this correct ?
>            path_to_attribute= "mapper."+ relation +".inverse" # I don't know 
> where to find this ?
>            instance.set_attribute(path_to_attribute,original_instance)
>        session.commit()
> 
> class BaseModel (DeclarativeBase):
>    ...
>    def set_attribute(self,path,value):
>        nodes        = path.explode(".")
>        current_node = self
>        for next_node in nodes :
>            current_node = getattr(current_node,next_node,None)
>            if not current_node:
>                raise InvalidAttribute
>        set_attribute(current_node,value) # sqla's set_attribute, for whatever 
> reason... 
> 
> 
> Y.Chaouche
> 
> 
> --- On Thu, 9/9/10, Michael Bayer <mike...@zzzcomputing.com> wrote:
> 
>> From: Michael Bayer <mike...@zzzcomputing.com>
>> Subject: Re: [sqlalchemy] Copying instances from old to new schema DB
>> To: sqlalchemy@googlegroups.com
>> Date: Thursday, September 9, 2010, 5:59 AM
>> 
>> On Sep 9, 2010, at 6:31 AM, chaouche yacine wrote:
>> 
>>> Hello list,
>>> 
>>> My schema has changed, and now I want to retrieve my
>> old data (of the old schema) to the new database (with the
>> new schema) from csv files (I export the tables of the old
>> database to csv files and then load those csv files to the
>> new database with some column mapping and some gap filling
>> for new columns)
>>> 
>>> The problem is that my new schema have a
>> UniqueConstraint for some tables (like : the name column of
>> the city table should be unique within a country). This
>> constraint was not present in the old schema and data is
>> corrupt (two cities with the same name in the same country).
>> So when I try to insert them in the new database, I have
>> IntegrityErrors. 
>>> 
>>> The solution I thought of was :
>>> * Catch the IntegrityError exception 
>>> * If it's a problem on a UniqueConstraint, then the
>> exception was raised because I tried to insert instance B
>> that has the same "key" as instance A that was inserted
>> before.
>>> * So for all children of B (B's relations), set their
>> parent to A. For example, for all citizens of B, set their
>> city to A, beause A and B ought to be the same. 
>>> * Then, safely ignore B and move on to the next
>> instance.
>>> 
>>> Here's what has been done so far (that works, I just
>> use psuedo code for "illustration" purpose. If necessary,
>> you can look at the actual attached source files): 
>>> 
>>> line       =
>> csvloader.next_row()
>>> ModelClass = get_current_model()
>>> instance   =
>> ModelClass.create_instance(**(to_dict(line)))
>>> session.add(instance) 
>>> 
>>> I wish I could do something like this : 
>>> try:    
>>>     session.commit()
>>> except IntegrityError,e : 
>>>     session.rollback()
>>>     errror = get_error()
>>>     if type_of(error) ==
>> UniqueConstraintError :
>>>         original_instance =
>> ModelClass.get(instance.id)
>>>         for relation in
>> instance.get_relations() : 
>>>             # Is this
>> correct ? 
>>>        
>> instance.relation.inverse = original_instance
>>>         session.commit()
>>> 
>>> My questions are : how to write get_error, type_of,
>> where to get UniqueContraintError, how to write
>> get_relations, how to set the inverse of a relation (is
>> instance.realtion.inverse the right thing to set ?) and is
>> this approach correct ?
>> 
>> There's no portable way to detect "unique constraint"
>> errors across DBAPIs, you'd have to catch the specific error
>> that you've observed your DBAPI emits as well as the
>> message/codes contained within it, and code against
>> that.   SQLAlchemy wraps DBAPI exceptions in
>> its own same-named wrapper and the object emitted by the
>> DBAPI is available under the ".orig" member.
>> 
>> One improvement to the recipe would be if you used
>> savepoints, if supported by your database, so that when you
>> issue rollback() inside a savepoint block, the whole
>> transaction isn't rolled back and you can maintain the whole
>> operation in one transaction.   The session
>> can start a savepoint using begin_nested().
>> 
>> When I do csv loads like these however I usually load the
>> full list of identifiers to be checked into memory ahead of
>> time.  If the csv is less than 100K rows and represents
>> the full table of data, I just load the whole thing into a
>> dictionary, formatted according to whatever "these are the
>> fields that are unique" I'm dealing with, and consult the
>> dictionary as I go along.    Otherwise, I load
>> individual blocks of data in as I scan through portions of
>> the csv (like, give me all the records where id in (a, b, c,
>> d, e, f, g, ...) for this block).   There's
>> nothing "wrong" with the "catch-the-error" approach except
>> that I like an entire operation to be in a single
>> transaction, and its usually simpler and faster just to work
>> with a Python dictionary.
>> 
>> 
>> 
>>> 
>>> 
>>> 
>>> 
>>> -- 
>>> You received this message because you are subscribed
>> to the Google Groups "sqlalchemy" group.
>>> To post to this group, send email to sqlalch...@googlegroups.com.
>>> To unsubscribe from this group, send email to
>> sqlalchemy+unsubscr...@googlegroups.com.
>>> For more options, visit this group at 
>>> http://groups.google.com/group/sqlalchemy?hl=en.
>>> 
>>> <loadCSV.py><db.py>
>> 
>> -- 
>> You received this message because you are subscribed to the
>> Google Groups "sqlalchemy" group.
>> To post to this group, send email to sqlalch...@googlegroups.com.
>> To unsubscribe from this group, send email to 
>> sqlalchemy+unsubscr...@googlegroups.com.
>> For more options, visit this group at 
>> http://groups.google.com/group/sqlalchemy?hl=en.
>> 
>> 
> 
> 
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "sqlalchemy" group.
> To post to this group, send email to sqlalch...@googlegroups.com.
> To unsubscribe from this group, send email to 
> sqlalchemy+unsubscr...@googlegroups.com.
> For more options, visit this group at 
> http://groups.google.com/group/sqlalchemy?hl=en.
> 

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalch...@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Re: [sqlalchemy] Copying instances from old to new schema DB

Reply via email to