Re: [sqlite] Improving Bulk Insert Speed (C/C++)

Richard Hipp Wed, 02 Apr 2014 09:27:13 -0700

On Wed, Apr 2, 2014 at 12:02 PM, Kevin Xu <accol...@gmail.com> wrote:


> Richard Hipp wrote:
>
> > So you are creating a 27.5 GB database in under a half hour?
> >
> > What is your database page size?
>
> Yes, to be exact, a 34.24GB database from (34GB+34GB) FASTQ files in that
> time. I did not use any pragmas for swiching SQLite page size, so it should
> be the default (probably 4096?).
>
> > You might try programming a virtual table to read you FASTQ files, then
> > transfer content from the virtual table into your real table using:
> >
> >     INSERT INTO realtable SELECT * FROM virtualtable;
> >
> > Information on virtual tables is at
> > http://www.sqlite.org/vtab.html
> > and you
> > can find example code in some of the extensions at
> >
> > http://www.sqlite.org/src/tree?ci=trunk&name=ext/misc
> > and in some of the
> > test modules at
> > http://www.sqlite.org/src/tree?ci=trunk&name=src&re=test_
>
> It might be possible, but virtual tables live in memory


No, not necessarily.  Many of the examples cited above live in memory.  But
it is easy to construct a virtual table that reads a disk file one record
at a time and only holds a single record in memory at a time.  You do not
need a massive virtual memory space.

Implementation sketch:

(1) The xBestIndex method always returns the same answer - a full table
scan from beginning to end.

(2) The xFilter method simply rewinds the FASTQ file back to the beginning.

(3) The xStep method advances to the next record of the FASTQ file,
discarding the previous record.

(3) The xColumn and xRowid methods return information about the single
FASTQ record currently held in memory.




> (I am developing on 16GB), while my source files over 30GB each (I
> originally read the entire file in a single operation into a string, didn't
> work with the 30GB files, which is why I switched to boost::mmap), so I
> don't think it would work on my platform.
>
> However, since the application is intended to be used on a computing
> cluster (a generic node in our cluster would have 72GB of RAM, but I could
> request 384GB "fat" nodes), it could work. However, I was working with 2
> faculty with conflicting views on the matter (the biochemist wants
> everything to work on low-powered systems, the computer scientist is sure
> that everything would work faster if I got the algorithm correct).
>
> Kevin.
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
D. Richard Hipp
d...@sqlite.org
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Improving Bulk Insert Speed (C/C++)

Reply via email to