If you can partition the rows numerically, this is trivially easily to
implement using redis as the orchestrator.

For example if you have integer PKs, you might have a loop like:

    offset = 0
    while offset < tablesize:
        for row in query[offset:batchsize]:
            migrate(row)
        commit()
        offset += batchsize

With redis orchestrating, you use a key in redis and INCRBY to reliably
distribute batches to an arbitrary number of workers on an arbitrary number
of hosts.

   while True:
       offset = redis.incrby('migration-offset', batchsize)
       rows = query[offset:batchsize]
       if not rows:
           break
       for row in rows:
           migrate(row)
       commit()

INCRBY is atomic and returns the adjusted value, so every invocation of
this script that calls into redis and INCRBYs by, say, 1000, has its own
chunk of 1000 to work on. For a starting value of -1000 and four
invocations, you'd see 0, 1000, 2000 and 3000.

I'll typically do this on one invocation, see that it's running well and
that I chose a performant batch size, and then spin up additional workers
on more cores until the migration hits the overall throughput required.



On Wed, Oct 14, 2015 at 9:32 AM, Jonathan Vanasco <jonat...@findmeon.com>
wrote:

> I have to run a script on 2MM objects to update the database.  Not really
> a schema migration, more like changing the internal data representation in
> the fields.
>
> There's a bit of post-processing and bottlenecks involved, so doing
> everything one-at-a-time will take a few days.
>
> I'd like to split this out into 5-10 'task runners' that are each
> responsible for a a section of the database (ie, every 5th record).  That
> should considerably drop the runtime.
>
> I thought I had seen a recipe for this somewhere, but checked and couldn't
> find anything.  That leads me to question if this is a good idea or not.
> Anyone have thoughts/pointers?
>
> --
> You received this message because you are subscribed to the Google Groups
> "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to sqlalchemy+unsubscr...@googlegroups.com.
> To post to this group, send email to sqlalchemy@googlegroups.com.
> Visit this group at http://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Reply via email to