Re: [sqlalchemy] Re: Efficient Inserting to Same Table Across 100s of Processes

Michael Bayer Wed, 23 May 2012 14:20:20 -0700

My initial thought is that INSERTs shouldn't be locking the whole table, at 
least not throughout a whole transaction.      There's some MySQL hints that 
can help with this, if you're on MyISAM take a look at 
http://dev.mysql.com/doc/refman/5.0/en/concurrent-inserts.html , possibly using 
the "concurrent_inserts" setting, as well as some of the INSERT hints specific 
to MySQL:  http://dev.mysql.com/doc/refman/5.5/en/insert.html .    On InnoDB it 
doesn't lock the whole table unless you do so explicitly.


SQLAlchemy is just a tool to help emit messages to the database.  You'd need to 
figure out what scheme you want to use to improve your INSERT throughput first, 
then apply that strategy through SQLAlchemy's API.   It might be as easy as 
switching to InnoDB, not sure.

If you're looking to serialize inserts after the fact, you can use a message 
queue for that, such as a Celery process.    This might be the way you want to 
go if you get large bursts of data with quiet periods, and it's OK that the 
MySQL database receives all the data "eventually".


On May 23, 2012, at 4:56 PM, Jeff wrote:

> More data:
> A typical not-quite-worst-but-in-the-class-of-worst case scenario is a
> half a million rows per insert. Absolute worst case scenarios could be
> 10 times that. So that insert will take awhile.
> 
> Would there be any logic to breaking up all the inserts into one row
> per insert? Would that allow independent jobs to meaningfully
> interleave their inserts? Does the overhead of sqlalchemy's session
> adding and committing not really work that way?
> 
> One uglier route that has been suggested to me is to first write all
> the rows to independent local files, then later merge and insert them.
> Does SQLAlchemy have functionality that might help in this direction?
> 
> On May 23, 4:19 pm, Jeff <jeffalst...@gmail.com> wrote:
>> Hello,
>> 
>> I have hundreds of independent jobs on a cluster all writing entries
>> to the same MySQL database table. Every time one job INSERTs, it locks
>> the table, and the other jobs have to queue up for their turn. So at
>> that point, the massively parallel cluster has turned into a massively
>> serial cluster :-( What are routes available in SQLAlchemy to improve
>> this situation?
>> 
>> Thanks!
>> Jeff
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "sqlalchemy" group.
> To post to this group, send email to sqlalchemy@googlegroups.com.
> To unsubscribe from this group, send email to 
> sqlalchemy+unsubscr...@googlegroups.com.
> For more options, visit this group at 
> http://groups.google.com/group/sqlalchemy?hl=en.
> 

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Re: [sqlalchemy] Re: Efficient Inserting to Same Table Across 100s of Processes

Reply via email to