one thing to note is that deepcopy() is not going to work.   It will copy 
SQLAlchemy's own accounting information on the object as well and generally 
cause confusion.

The easiest way to insert a lot of data while detecting dupes efficiently is to 
sort the data, then chunk through it, and for each chunk, pre-load from the 
database all those records which reside within the range of that chunk of 
pending data.   You put those pre-loaded records in a dictionary and check it 
for each record.

A simple system I use very often is (this isn't chunked, but could be):

recs = dict(session.query(MyThing.id, MyThing))

for i, newrec in enumerate(my_incoming_stuff):
    if newrec['id'] in recs:
        rec = recs[newrec['id']]
        rec.data = newrec['data']
    else:
        rec = MyThing(id=newrec['id'], data=newrec['data'])
        session.add(rec)

    if i % 1000 == 0:
        session.flush()

session.commit()


On Mar 18, 2013, at 1:54 AM, alonn <alonis...@gmail.com> wrote:

> Like all CRUD goes, I need to write some data to a table. when I write new 
> data to the table, everything works like charm. the problem starts when I 
> need to write data already existing in the table (actually updating some data 
> with the same primary key).
> the data just doesn't seem to be written to the table! I started with trying 
> to update the data with session.merge(), but later tried a more brute force 
> approach, of querying for the same primary_key in the table, deleting it and 
> the adding and flushing the changed objects.
> some where, if the basic add and flush failes the rest doesn't work.  I'll be 
> glad for a clue here.
> 
> The code:
> 
>     def flush(obj_Instance, id):
>     """
>     taking care of the sqlalchemy flushing
>     params:
>             Instance: an object Instance to flush into
>             id: the unique object instance id
>     """
> 
>     DBSession2.add(obj_Instance)
> 
>     try:
> 
>         try:
>             DBSession2.flush()
>             print ("flushed:", str(obj_Instance))
>         except (FlushError, IntegrityError) as err:
>             DBSession2.rollback()
>             if ('conflicts with persistent instance' in str(err)) or 
> ('Duplicate key was ignored' in str(err)):
>                 transaction.begin()
>                 #my original slick take:
>                 DBSession2.merge(obj_instance) # but after it failed to 
> update correctly I changed to a more brute force approach
>                 #DBSession2.flush()  #to save the merge
>                 #from here on trying to brute force it
>                 #saving for further reference - another try
>                 newInstance = deepcopy(obj_Instance)
>                 print ("deleting: %s" % id)
>                 DBSession2.query(type(obj_Instance)).filter_by(ids = 
> id).delete()
>                 DBSession2.flush() #at this point, I was so desperate for 
> this to work I literated the code with flush commands.
>                 DBSession2.add(newInstance)
>                 DBSession2.flush()
>                 return
>             else:
>                 raise #handling the case of the same key problem isn't the 
> source of conflicts
> 
>     except Exception as err:  # supposed to find out the error type and 
> message
>     # the code doesn't get here, only in real exceptions it was planned to 
> catch, 3 rows in 10,000 uploaded to the db
>     #TODO: make this less general and more specific
>         print str(err)
>         write_log(num=id, msg="some sql or sqlalchemy error use num %s as id 
> identifier with object: %s" % (id, obj_Instance.name), timestamp=
>             datetime.now(), errtype="sql error", log=str(err))
>         DBSession2.rollback()
>         transaction.begin()
> 
> maybe this try/fail/rollback/merge or delete/insert new pattern is wrong 
> (also I think pythonic - try and ask forgiveness,but  that would be for mike 
> to judge)
> 
> using sqlalchemy 0.7.3 vs mssql 2005 with pyodbc 2.1.11 and tg 2.1 (the 
> transaction manager comes with tg and I think is based transaction)
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to sqlalchemy+unsubscr...@googlegroups.com.
> To post to this group, send email to sqlalchemy@googlegroups.com.
> Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
> For more options, visit https://groups.google.com/groups/opt_out.
>  
>  

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to