Robert Citek wrote: > Are there some white papers or examples of how to do updates in > parallel using sqlite? > > I have a large dataset in sqlite that I need to process outside of > sqlite and then update the sqlite database. The process looks > something like this: > > sqlite3 -separator $'\t' sample.db 'select rowid, item from foo;' | > while read rowid item ; do > status=$(long_running_process "${item}" ) > sqlite3 sample.db "update foo set status=${status} where rowid=${rowid} ;" > done > > Because long_running_process takes a long time, I could speed up the > overall time by running more than one long_running_process at the same > time. One way to do this would be to segment the data and run a > separate process on each segment. For the update each process would > collect the status data "outside" of the sample.db, e.g in a separate > database. When all the processes have finished, the parent process > would attach the separate databases and update the original database. > When all is done, the parent process would clean up the ancillary > databases. >
I could be misunderstanding your requirements, but this sounds a little like Map Reduce: http://labs.google.com/papers/mapreduce.html The only point I'd question is your assertion that you could speed up the overall time by running more than one long running process at the same time. You *might* be able to do so up to the limit of the cores in the machine or by distributing the load over many machines, however, the implication to me of a long running process is something that is consuming large amounts of CPU time. It is possible that running multiple processes per processor could actually increase the total amount of time due to process swap overhead. FWIW, John _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users