Re: [sqlite] parallelizing an update

Sylvain Pointeau Sun, 31 Jan 2010 01:05:08 -0800

Hello,

it is not aesthetic, it groups all update in a single transaction that speed
up the processing.
Using multi thread or multi-process is not efficient, it is at the end a
single process that can write to the database (a single file).


Grouping all the update in a single transaction is the only way to speed up
your program,

Cheers,
Sylvain

On Sun, Jan 31, 2010 at 1:15 AM, Robert Citek <robert.ci...@gmail.com>wrote:

> Sure.  This script can use a lot of aesthetic improvement, but it
> highlights processing the data in a single process.
>
> The question would be, how to modify the script to process the data in
> with parallel processes?
>
> Regards,
> - Robert
>
> On Sat, Jan 30, 2010 at 4:36 AM, Sylvain Pointeau
> <sylvain.point...@gmail.com> wrote:
> > a good thing would have been to generate one file with all the
> statements...
> > if you do that then you run sqlite with this file surrounded by
> transaction
> > begin/commit
> >
> > echo "begin transaction" >> update.sql
> >
> > sqlite3 -separator $'\t' sample.db 'select rowid, item from foo;' |
> > while read rowid item ; do
> >  status=$(long_running_process "${item}" )
> >  echo "update foo set status=${status} where rowid=${rowid} ;" >>
> update.sql
> > done
> >
> > echo "commit transaction" >> update.sql
> >
> > sqlite3 sample.db < update.sql
> >
> > Best regards,
> > Sylvain
> >
> > On Sat, Jan 30, 2010 at 12:04 AM, Robert Citek <robert.ci...@gmail.com
> >wrote:
> >
> >> Are there some white papers or examples of how to do updates in
> >> parallel using sqlite?
> >>
> >> I have a large dataset in sqlite that I need to process outside of
> >> sqlite and then update the sqlite database.  The process looks
> >> something like this:
> >>
> >> sqlite3 -separator $'\t' sample.db 'select rowid, item from foo;' |
> >> while read rowid item ; do
> >>  status=$(long_running_process "${item}" )
> >>  sqlite3 sample.db "update foo set status=${status} where rowid=${rowid}
> ;"
> >> done
> >>
> >> Because long_running_process takes a long time, I could speed up the
> >> overall time by running more than one long_running_process at the same
> >> time.  One way to do this would be to segment the data and run a
> >> separate process on each segment.  For the update each process would
> >> collect the status data "outside" of the sample.db, e.g in a separate
> >> database.  When all the processes have finished, the parent process
> >> would attach the separate databases and update the original database.
> >> When all is done, the parent process would clean up the ancillary
> >> databases.
> >>
> >> I was just wondering if there are other ways to do this that I may be
> >> overlooking.
> >>
> >> Thanks in advance to pointers to any references.
> >>
> >> Regards,
> >> - Robert
> >> _______________________________________________
> >> sqlite-users mailing list
> >> sqlite-users@sqlite.org
> >> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> >>
> > _______________________________________________
> > sqlite-users mailing list
> > sqlite-users@sqlite.org
> > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> >
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] parallelizing an update

Reply via email to