On Nov 13, 2013, at 11:52 AM, Claudio Freire <klaussfre...@gmail.com> wrote:
> On Wed, Nov 13, 2013 at 1:45 PM, Michael Bayer <mike...@zzzcomputing.com> > wrote: >> >> Am Mittwoch, 6. November 2013 21:58:53 UTC+1 schrieb Michael Bayer: >>> >>> I wrote a full post regarding this topic on stackoverflow at >>> http://stackoverflow.com/questions/11769366/why-is-sqlalchemy-insert-with-sqlite-25-times-slower-than-using-sqlite3-directly/11769768#11769768 >>> . If you start with this, I can answer more specific questions. >> >> >> The article was very helpful, thanks. I still want to figure out the best >> balance between convenience and speed for my use case. Do the following make >> sense and is possible? >> >> I work only with Postgresql and I'm sure that all involved objects have a >> unique id column which is called 'id'. So before doing a session.commit(), >> I could check how many objects are in my session. As I'm just bulk >> inserting, I know that all of them are new and don't have their id set yet. >> Now I ask the database for that number of new ids, iterate over the objects >> in my session and set the ids. Internally all ids would come from a single >> sequence, so I don't have to care about object types and so on. Afterwards >> SqlAlchemy should be aware that ids have already been set, so no generated >> ids have to be returned and the session.commit() should be much simpler and >> faster. >> >> Sounds like a still quite simple, but hopefully much faster solution. Do you >> agree? >> >> >> >> sure that should be fine, if you can pre-calc your PKs. It just won’t work >> under any kind of concurrency, as in such a situation there could be >> interleaved INSERTs from different processes. > > > Postgresql sequences already handle that kind of concurrency scenario. how exactly, if two transactions T1 and T2 both pull a number from a sequence, T1 gets 40, T2 gets 41, then we pre-calc 10 inserts that have not yet occurred for each, T1 has 40-50, T2 has 41-51, the number is totally wrong for both - in reality it would be some random distribution of 40-60 between T1 and T2. No ?
signature.asc
Description: Message signed with OpenPGP using GPGMail