Thanks !  Good ideas. SSD is still a little exotic price-wise, but closing
and defragging is easy to try.

Coming to think of it - your suggestion to break the process into batches -
is probably the way to go. By placing each table in its own DB - not only
are they smaller and easier to manager - but the vaccuum process becomes a
single table vaccuum instead of a multi-table vaccum - not having to redo
tables that are already streamlined. Very cool.

In which case - how about not combining them at all - and attaching?

In other words - is the performance of 1 large DB that includes all the
data tables plus the "work" tables - about equivalent to a "work only" DB
that attaches, as needed, to multiple external DBs - each containing 1
table? With some databases - cross-database communication is much slower
than keeping everything local. Is there substantial overhead to using
multiple "attach" statements in Sqlite?

Thanks,

Udi

On Thu, Feb 2, 2012 at 3:51 PM, Simon Slavin <slav...@bigfraud.org> wrote:

>
> On 2 Feb 2012, at 11:31pm, Udi Karni wrote:
>
> > Given how clever and compelling Sqlite is - I am testing how it scales to
> > tables in the 100GB / 200 million row range. This is for a strictly "read
> > only" application - but first the tables must be populated in a one time
> > process. As is often the case with Big Data - the data is a little dirty
> -
> > so the process involves importing - selecting - counting - inspecting -
> > updating some rows - deleting some rows - selecting - counting -
> > inspecting, etc. until clean.
> >
> > Placing the Sqlite database on a traditional C: drive - IO was too slow.
> At
> > 15 MB/sec - reading a 50GB table would take an hour. So I moved it to
> > external Raid array where I ran across an interesting find. IO wasn't
> that
> > much faster - until I vaccuumed the database - which increase IO 10X to
> 150
> > MB/sec - with the same CPU utilization.
> >
> > This is good news for the final implementation of this read-only
> database -
> > but still a dilemma at the data load phase. After a ".vaccuum" -
> issueing a
> > single DML against a table - even a DELETE which deletes no rows at all -
> > causes IO to drop back down to 15 MB/sec - on the table I'm selecting /
> > DMLing - which makes the data loading / cleansing phase very long.
>
> Nice description of your situation and requirements.  Makes it easier to
> answer your questions.
>
> > So I have 2 questions -
> >
> > (1) Why would simple DML cause such an extreme slowdown as compared with
> > "post vaccuum" speeds ?
>
> Spins.  Reading one area of the database file means waiting for rotations
> of the disk.  If the data you need is fragmented you end up wasting a lot
> of time.  You can't speed it up because you are just sitting there waiting
> for the disk to turn to be in the right place.  Instead of using a RAID try
> using a non-episodic medium like a solid-state storage instead.
>
> > (2) Any knobs to turn to try and maintain the higher speeds post DML -
> > without resorting to ".vaccuum" ?
>
> There are a ton of things which will help but probably not enough to make
> it worthwhile for a one-shot job.  You can presort your data into batches.
>  Searching an index which is already in order is faster.  You can close the
> database, use your OS' facilities to defragment the file, then reopen the
> database.  (This helps a lot under Windows which is very sensitive to
> fragmentation, somewhat less for other OSen.)  You can do your initial
> insertions into smaller databases then merge them.
>
> But your telling of your situation suggests to me that this isn't worth
> doing.  You have a one-time-only activity.  It's probably gonna take you
> longer to do the programming than you'll save.
>
> Simon.
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to