The two instances don't have the same ID, the uniqueness is tested on two other columns.
> session.query(cls).filter(cls.the_unique_column==new_object.the_unique_column).first() Is there a way to get the tuple (usually a couple) of columns that are involved in a unique constraint ? if there is such a way, than the line of code above could be generically applied to all models, which would be sweet :) something like : for row in csv: new_object = make_an_object_from_csv_row(row) existing_object = get_existing_object(new_object) if existing_object is not None: new_object.id = existing_object.id new_object = session.merge(new_object) session.commit() def get_existing_object(new_object): unique_constraints = [constraint.columns for constraint in cls.get_constraints() if constraint.get_type() == UniqueConstraint] u_c_columns = [constraint.columns for constraint in unique_constraints] query = session.query(cls) for column_tuples in u_c_columns : for column in column_tuples : query = query.filter(cls.get_attr(column) == new_object.get_attr(column)) return query.first() --- On Thu, 9/9/10, Michael Bayer <mike...@zzzcomputing.com> wrote: > From: Michael Bayer <mike...@zzzcomputing.com> > Subject: Re: [sqlalchemy] Copying instances from old to new schema DB > To: sqlalchemy@googlegroups.com > Date: Thursday, September 9, 2010, 9:54 AM > > On Sep 9, 2010, at 11:31 AM, chaouche yacine wrote: > > > Thank you Michael, that's for the catch-the-error > part. How about the set-relations-right part, if I decided > to go with my fix-as-you go recipe ? do you have any idea ? > the dictionary approach seems good, but I still am curious > about how to set relations generically on models. I think > I'll use it somewhere else in my code. > > You don't need to set any relations. If > your new B has a fully populated primary key, you should > just be using session.merge() for the whole thing. > This can work with your "integrity error" scheme, or > more simply with the "select first" schemes below: > > > for row in csv: > new_object = > make_an_object_from_csv_row(row) > new_object = session.merge(new_object) > session.commit() > > > if new_object does *not* have the correct primary key, > then: > > for row in csv: > new_object = > make_an_object_from_csv_row(row) > existing_object = > session.query(cls).filter(cls.the_unique_column==new_object.the_unique_column).first() > if existing_object is not None: > new_object.id = > existing_object.id > new_object = session.merge(new_object) > session.commit() > > > if you don't like the many indvidual SELECT statements, > then > > # load a dictionary of (unique attr, primary key) > lookup = dict( > session.query(cls.the_unique_column, > cls.id) > ) > for row in csv: > new_object = > make_an_object_from_csv_row(row) > if new_object.the_unique_column > in lookup: > new_object.id = > lookup[new_object.the_unique_column] > new_object = session.merge(new_object) > session.commit() > > or if most of your rows are replacement rows, quicker to > preload everything: > > > # load a dictionary of (unique attr, instance) > lookup = dict( > session.query(cls.the_unique_column, > cls) > ) > for row in csv: > new_object = > make_an_object_from_csv_row(row) > if new_object.the_unique_column > in lookup: > new_object.id = > lookup[new_object.the_unique_column].id > new_object = session.merge(new_object) > session.commit() > > > > > > > > > By the way, I think the pseudo should have been : > > > > instance = next_instance_from_csv() > > try: > > session.commit() > > except IntegrityError,e : > > session.rollback() > > if e.orig == UniqueConstraintError : > > original_instance = > ModelClass.get(instance.id) > > for relation_name in > instance.get_relation_names() : > > # Is this > correct ? > > > path_to_attribute= "mapper."+ relation +".inverse" # I don't > know where to find this ? > > > instance.set_attribute(path_to_attribute,original_instance) > > session.commit() > > > > class BaseModel (DeclarativeBase): > > ... > > def set_attribute(self,path,value): > > nodes > = path.explode(".") > > current_node = self > > for next_node in nodes : > > current_node > = getattr(current_node,next_node,None) > > if not > current_node: > > > raise InvalidAttribute > > > set_attribute(current_node,value) # sqla's set_attribute, > for whatever reason... > > > > > > Y.Chaouche > > > > > > --- On Thu, 9/9/10, Michael Bayer <mike...@zzzcomputing.com> > wrote: > > > >> From: Michael Bayer <mike...@zzzcomputing.com> > >> Subject: Re: [sqlalchemy] Copying instances from > old to new schema DB > >> To: sqlalchemy@googlegroups.com > >> Date: Thursday, September 9, 2010, 5:59 AM > >> > >> On Sep 9, 2010, at 6:31 AM, chaouche yacine > wrote: > >> > >>> Hello list, > >>> > >>> My schema has changed, and now I want to > retrieve my > >> old data (of the old schema) to the new database > (with the > >> new schema) from csv files (I export the tables of > the old > >> database to csv files and then load those csv > files to the > >> new database with some column mapping and some gap > filling > >> for new columns) > >>> > >>> The problem is that my new schema have a > >> UniqueConstraint for some tables (like : the name > column of > >> the city table should be unique within a country). > This > >> constraint was not present in the old schema and > data is > >> corrupt (two cities with the same name in the same > country). > >> So when I try to insert them in the new database, > I have > >> IntegrityErrors. > >>> > >>> The solution I thought of was : > >>> * Catch the IntegrityError exception > >>> * If it's a problem on a UniqueConstraint, > then the > >> exception was raised because I tried to insert > instance B > >> that has the same "key" as instance A that was > inserted > >> before. > >>> * So for all children of B (B's relations), > set their > >> parent to A. For example, for all citizens of B, > set their > >> city to A, beause A and B ought to be the same. > >>> * Then, safely ignore B and move on to the > next > >> instance. > >>> > >>> Here's what has been done so far (that works, > I just > >> use psuedo code for "illustration" purpose. If > necessary, > >> you can look at the actual attached source files): > > >>> > >>> line = > >> csvloader.next_row() > >>> ModelClass = get_current_model() > >>> instance = > >> ModelClass.create_instance(**(to_dict(line))) > >>> session.add(instance) > >>> > >>> I wish I could do something like this : > >>> try: > >>> session.commit() > >>> except IntegrityError,e : > >>> session.rollback() > >>> errror = get_error() > >>> if type_of(error) == > >> UniqueConstraintError : > >>> > original_instance = > >> ModelClass.get(instance.id) > >>> for > relation in > >> instance.get_relations() : > >>> > # Is this > >> correct ? > >>> > >> instance.relation.inverse = original_instance > >>> > session.commit() > >>> > >>> My questions are : how to write get_error, > type_of, > >> where to get UniqueContraintError, how to write > >> get_relations, how to set the inverse of a > relation (is > >> instance.realtion.inverse the right thing to set > ?) and is > >> this approach correct ? > >> > >> There's no portable way to detect "unique > constraint" > >> errors across DBAPIs, you'd have to catch the > specific error > >> that you've observed your DBAPI emits as well as > the > >> message/codes contained within it, and code > against > >> that. SQLAlchemy wraps DBAPI > exceptions in > >> its own same-named wrapper and the object emitted > by the > >> DBAPI is available under the ".orig" member. > >> > >> One improvement to the recipe would be if you > used > >> savepoints, if supported by your database, so that > when you > >> issue rollback() inside a savepoint block, the > whole > >> transaction isn't rolled back and you can maintain > the whole > >> operation in one transaction. The > session > >> can start a savepoint using begin_nested(). > >> > >> When I do csv loads like these however I usually > load the > >> full list of identifiers to be checked into memory > ahead of > >> time. If the csv is less than 100K rows and > represents > >> the full table of data, I just load the whole > thing into a > >> dictionary, formatted according to whatever "these > are the > >> fields that are unique" I'm dealing with, and > consult the > >> dictionary as I go along. Otherwise, > I load > >> individual blocks of data in as I scan through > portions of > >> the csv (like, give me all the records where id in > (a, b, c, > >> d, e, f, g, ...) for this > block). There's > >> nothing "wrong" with the "catch-the-error" > approach except > >> that I like an entire operation to be in a single > >> transaction, and its usually simpler and faster > just to work > >> with a Python dictionary. > >> > >> > >> > >>> > >>> > >>> > >>> > >>> -- > >>> You received this message because you are > subscribed > >> to the Google Groups "sqlalchemy" group. > >>> To post to this group, send email to sqlalch...@googlegroups.com. > >>> To unsubscribe from this group, send email to > >> sqlalchemy+unsubscr...@googlegroups.com. > >>> For more options, visit this group at > >>> http://groups.google.com/group/sqlalchemy?hl=en. > >>> > >>> <loadCSV.py><db.py> > >> > >> -- > >> You received this message because you are > subscribed to the > >> Google Groups "sqlalchemy" group. > >> To post to this group, send email to sqlalch...@googlegroups.com. > >> To unsubscribe from this group, send email to > sqlalchemy+unsubscr...@googlegroups.com. > >> For more options, visit this group at > >> http://groups.google.com/group/sqlalchemy?hl=en. > >> > >> > > > > > > > > > > -- > > You received this message because you are subscribed > to the Google Groups "sqlalchemy" group. > > To post to this group, send email to sqlalch...@googlegroups.com. > > To unsubscribe from this group, send email to > sqlalchemy+unsubscr...@googlegroups.com. > > For more options, visit this group at > > http://groups.google.com/group/sqlalchemy?hl=en. > > > > -- > You received this message because you are subscribed to the > Google Groups "sqlalchemy" group. > To post to this group, send email to sqlalch...@googlegroups.com. > To unsubscribe from this group, send email to > sqlalchemy+unsubscr...@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/sqlalchemy?hl=en. > > -- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To post to this group, send email to sqlalch...@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.