[sqlalchemy] Re: Reentrant data population script using sql alchemy

Michael Bayer Sun, 09 Aug 2009 08:40:55 -0700


On Aug 8, 2009, at 10:58 PM, gizli wrote:


>
> Hi all,
>
> Forgive me if this is a very basic question but I could not find the
> answer anywhere.
>
> I am trying to write a data population script to setup a few tables
> and insert some data in them. I need to make this script reentrant..
> The problem is that currently the script looks like this:
>
> get a session
> use metadata.create_all() to create all the tables
> call add(obj) on all objects that need to be inserted
> flush()
> commit()
>
> create_all api is reentrant..  However when I add() existing objects,
> the script fails when flush() happens with integrity error. I thought
> of wrapping the add() function and doing a flush after each add, catch
> the integrity exception and pass. However that requires me to create a
> new session object after each flush (since the transaction is rolled
> back if a failure happens)..  Is there another way of doing this in
> sqlalchemy w/o creating a session object for every single add() call?
>
> One elegant way would be to be able to query the seed db for the obj..
> if it exists, pass.. my script does not know about the structure of
> the objects being inserted, therefore i cannot write a generic query
> based on its attributes.. it would be nice if one could do something
> like:
>
> try:
>   session.query(obj)
> except NotFound:
>   session.add(obj)
>
> does such a thing exist in sql alchemy? if not, what do you suggest I
> do? Please bear the following in mind:
>
> 1. I cannot remove everything in the tables at start time since there
> is more than one script and there is only one seed db.
> 2. The solution has to be generic and not require me to know what kind
> of attributes/primary_keys obj has
> 3. I would like to do this using one session instance if possible

"insert if not exists" is implemented in two ways:

1. use nested transactions so that you can ROLLBACK just the attempt  
to INSERT something, without the whole session being rolled back.      
This is accomplished using the begin_nested() method on Session.

2. check the database for the value first using SELECT.

Almost always, I opt for #2.    Depending on the scale of the  
operation, there's two ways of doing it:

        1. issue a SELECT for each individual elment
        2. preload the full span of objects to save on SELECTs.

The easiest way to do auto "insert or update" given an object that  
you've loaded from a file is to use merge():

        myobj = session.merge(myobj)

which will do the SELECT for you.  If the object is already loaded, no  
SELECT is emitted.

So it follows then that you can SELECT all the rows of the table ahead  
of time, keep them referenced in a collection:

        all_objects = session.query(MyStuff).all()

then you can merge() each element you pull from the file as needed,  
and no additional SQL should be emitted to fetch any of it.





--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~----------~----~----~----~------~----~------~--~---

[sqlalchemy] Re: Reentrant data population script using sql alchemy

Reply via email to