Robert Citek wrote:
> Are there some white papers or examples of how to do updates in
> parallel using sqlite?
>
> I have a large dataset in sqlite that I need to process outside of
> sqlite and then update the sqlite database.  The process looks
> something like this:
>
> sqlite3 -separator $'\t' sample.db 'select rowid, item from foo;' |
> while read rowid item ; do
>   status=$(long_running_process "${item}" )
>   sqlite3 sample.db "update foo set status=${status} where rowid=${rowid} ;"
> done
>
> Because long_running_process takes a long time, I could speed up the
> overall time by running more than one long_running_process at the same
> time.  One way to do this would be to segment the data and run a
> separate process on each segment.  For the update each process would
> collect the status data "outside" of the sample.db, e.g in a separate
> database.  When all the processes have finished, the parent process
> would attach the separate databases and update the original database.
> When all is done, the parent process would clean up the ancillary
> databases.
>   

I could be misunderstanding your requirements, but this sounds a little 
like Map Reduce:

http://labs.google.com/papers/mapreduce.html

The only point I'd question is your assertion that you could speed up 
the overall time by running more than one long running process at the 
same time.  You *might* be able to do so up to the limit of the cores in 
the machine or by distributing the load over many machines, however, the 
implication to me of a long running process is something that is 
consuming large amounts of CPU time.  It is possible that running 
multiple processes per processor could actually increase the total 
amount of time due to process swap overhead.

FWIW,


John
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to