I'm having a few issues with unique constraints and bulk inserts. The 
software I'm writing takes data from an external source (a lot of it, 
anywhere from 1,000 rows per minute to 100-200k+), crunches it down into 
its hierarchy and saves it to the DB, to be aggregated in the background. 
The function handling the initial DB save is designed to work with about 
20-50k rows at a time - very little modification takes place, it's pretty 
much just grabbed and thrown into the table. Obviously the amount of data 
being saved somewhat excludes the use of the ORM in this particular table, 
but there are a number of other tables that benefit from using the ORM. 
Hence, the small stuff uses the ORM and the big stuff uses the Core.

The main problem I'm having is with the initial save. The data comes in 
unordered and sometimes contains duplicates, so there's a UniqueConstraint 
on Entry on sub, division, created. Unfortunately, this hampers the bulk 
insert - if there's a duplicate, it rolls back the entire insert and hence 
the entries aren't available to be referenced by the segments later. 
Obviously, capturing it in a try/catch would skip the whole block as well. 
Both Entry and Segment have the same problem - there are often duplicate 
segments. Since there's a large amount of data being pushed through it, I 
assume it's impractical to insert the elements individually - while there's 
only 100-200 entries per block, there's usually 20-50k segments.

Is there any way of forcing the engine to skip over duplicates and not 
rollback the transaction on exception? Code's below. Using Postgres, with 
psycopg2 as the driver.


        engine.execute(Entry.__table__.insert(), entries)

        segment_list = []
        for sub, entry in entry.items():
            segments = entry.pop('segments')

            e = db.query(Entry)\
                .filter(Entry.sub==entry['sub'])\
                .filter(Entry.division==entry['division'])\
                .filter(Entry.created==entry['created']).first()

            for segment in segments:
                segment['entry_id'] = e.id
                segment_list.append(segment)

        engine.execute(Segment.__table__.insert(), segment_list)

In addition, is there some way to pre-fetch data? Rather than query for 
each Entry, it'd be nice to pre-load all entries and save a couple hundred 
queries.

Thanks!

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Reply via email to