Hello, it is not aesthetic, it groups all update in a single transaction that speed up the processing. Using multi thread or multi-process is not efficient, it is at the end a single process that can write to the database (a single file).
Grouping all the update in a single transaction is the only way to speed up your program, Cheers, Sylvain On Sun, Jan 31, 2010 at 1:15 AM, Robert Citek <robert.ci...@gmail.com>wrote: > Sure. This script can use a lot of aesthetic improvement, but it > highlights processing the data in a single process. > > The question would be, how to modify the script to process the data in > with parallel processes? > > Regards, > - Robert > > On Sat, Jan 30, 2010 at 4:36 AM, Sylvain Pointeau > <sylvain.point...@gmail.com> wrote: > > a good thing would have been to generate one file with all the > statements... > > if you do that then you run sqlite with this file surrounded by > transaction > > begin/commit > > > > echo "begin transaction" >> update.sql > > > > sqlite3 -separator $'\t' sample.db 'select rowid, item from foo;' | > > while read rowid item ; do > > status=$(long_running_process "${item}" ) > > echo "update foo set status=${status} where rowid=${rowid} ;" >> > update.sql > > done > > > > echo "commit transaction" >> update.sql > > > > sqlite3 sample.db < update.sql > > > > Best regards, > > Sylvain > > > > On Sat, Jan 30, 2010 at 12:04 AM, Robert Citek <robert.ci...@gmail.com > >wrote: > > > >> Are there some white papers or examples of how to do updates in > >> parallel using sqlite? > >> > >> I have a large dataset in sqlite that I need to process outside of > >> sqlite and then update the sqlite database. The process looks > >> something like this: > >> > >> sqlite3 -separator $'\t' sample.db 'select rowid, item from foo;' | > >> while read rowid item ; do > >> status=$(long_running_process "${item}" ) > >> sqlite3 sample.db "update foo set status=${status} where rowid=${rowid} > ;" > >> done > >> > >> Because long_running_process takes a long time, I could speed up the > >> overall time by running more than one long_running_process at the same > >> time. One way to do this would be to segment the data and run a > >> separate process on each segment. For the update each process would > >> collect the status data "outside" of the sample.db, e.g in a separate > >> database. When all the processes have finished, the parent process > >> would attach the separate databases and update the original database. > >> When all is done, the parent process would clean up the ancillary > >> databases. > >> > >> I was just wondering if there are other ways to do this that I may be > >> overlooking. > >> > >> Thanks in advance to pointers to any references. > >> > >> Regards, > >> - Robert > >> _______________________________________________ > >> sqlite-users mailing list > >> sqlite-users@sqlite.org > >> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > >> > > _______________________________________________ > > sqlite-users mailing list > > sqlite-users@sqlite.org > > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > > > _______________________________________________ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users