On Sep 9, 2010, at 6:31 AM, chaouche yacine wrote:

> Hello list,
> 
> My schema has changed, and now I want to retrieve my old data (of the old 
> schema) to the new database (with the new schema) from csv files (I export 
> the tables of the old database to csv files and then load those csv files to 
> the new database with some column mapping and some gap filling for new 
> columns)
> 
> The problem is that my new schema have a UniqueConstraint for some tables 
> (like : the name column of the city table should be unique within a country). 
> This constraint was not present in the old schema and data is corrupt (two 
> cities with the same name in the same country). So when I try to insert them 
> in the new database, I have IntegrityErrors. 
> 
> The solution I thought of was :
> * Catch the IntegrityError exception 
> * If it's a problem on a UniqueConstraint, then the exception was raised 
> because I tried to insert instance B that has the same "key" as instance A 
> that was inserted before.
> * So for all children of B (B's relations), set their parent to A. For 
> example, for all citizens of B, set their city to A, beause A and B ought to 
> be the same. 
> * Then, safely ignore B and move on to the next instance.
> 
> Here's what has been done so far (that works, I just use psuedo code for 
> "illustration" purpose. If necessary, you can look at the actual attached 
> source files): 
> 
> line       = csvloader.next_row()
> ModelClass = get_current_model()
> instance   = ModelClass.create_instance(**(to_dict(line)))
> session.add(instance) 
> 
> I wish I could do something like this : 
> try:    
>    session.commit()
> except IntegrityError,e : 
>    session.rollback()
>    errror = get_error()
>    if type_of(error) == UniqueConstraintError :
>        original_instance = ModelClass.get(instance.id)
>        for relation in instance.get_relations() : 
>            # Is this correct ? 
>           instance.relation.inverse = original_instance
>        session.commit()
> 
> My questions are : how to write get_error, type_of, where to get 
> UniqueContraintError, how to write get_relations, how to set the inverse of a 
> relation (is instance.realtion.inverse the right thing to set ?) and is this 
> approach correct ?

There's no portable way to detect "unique constraint" errors across DBAPIs, 
you'd have to catch the specific error that you've observed your DBAPI emits as 
well as the message/codes contained within it, and code against that.   
SQLAlchemy wraps DBAPI exceptions in its own same-named wrapper and the object 
emitted by the DBAPI is available under the ".orig" member.

One improvement to the recipe would be if you used savepoints, if supported by 
your database, so that when you issue rollback() inside a savepoint block, the 
whole transaction isn't rolled back and you can maintain the whole operation in 
one transaction.   The session can start a savepoint using begin_nested().

When I do csv loads like these however I usually load the full list of 
identifiers to be checked into memory ahead of time.  If the csv is less than 
100K rows and represents the full table of data, I just load the whole thing 
into a dictionary, formatted according to whatever "these are the fields that 
are unique" I'm dealing with, and consult the dictionary as I go along.    
Otherwise, I load individual blocks of data in as I scan through portions of 
the csv (like, give me all the records where id in (a, b, c, d, e, f, g, ...) 
for this block).   There's nothing "wrong" with the "catch-the-error" approach 
except that I like an entire operation to be in a single transaction, and its 
usually simpler and faster just to work with a Python dictionary.



> 
> 
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "sqlalchemy" group.
> To post to this group, send email to sqlalch...@googlegroups.com.
> To unsubscribe from this group, send email to 
> sqlalchemy+unsubscr...@googlegroups.com.
> For more options, visit this group at 
> http://groups.google.com/group/sqlalchemy?hl=en.
> 
> <loadCSV.py><db.py>

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalch...@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Reply via email to