The app seems to use between 60-80% CPU while it is running (from Activity 
Monitor) while disk use (using sudo iotop -P on OSX) seem to suggest about 
20%-30% I/O use. (spikes to over 30million% every 10s or so, might be when 
sqlite decides to page out)

Clearly my computations on the data are not complex enough to max out the 
processor (the bulk inserts, at least, is single threaded) while iotop results 
suggests I/O isn’t maxed out either, which is why I suspect performance 
improvements are still possible.

On the other hand, it is difficult for anyone to suggest more ideas without 
seeing the actual source code (professor wants it private for now), so I will 
try coding up the virtual table, and pure binary file dump (as a baseline) to 
see if I discover any more issues.

Kevin Xu


On Apr 2, 2014, at 1:28 PM, Keith Medcalf <kmedc...@dessus.com> wrote:

> On Tue, 1 Apr 2014 20:58:14 -0700
> Kevin Xu <accol...@gmail.com> wrote:
> 
>> I am trying to see if it is possible to achieve even higher
>> throughput - my professor was adamant that he managed 
>> to insert 3 million rows in 7 seconds (which works out to 
>> over 420K inserts per second) though he could not find 
>> his code that did it or records of his insertions.
> 
>> When I profiled the application (using Instruments), and after
>> inverting the call tree, the time spent within the program is 
>> broken down as follows:
> 
>> 2170ms 15.1% - sqlite3VdbeExec -> sqlite3_step -> insert function
>> 2142ms 14.9% - pwrite -> unixWrite -> pager_write/pager_write_pagelist
>> 1925ms 14.9% - std::string::insert -> boost::spirit
>> 539ms    3.7% - pack (my compression function)
> 
> A question that nobody seems to have bothered to ask, and that will
> entirely direct your solution search:
> 
> Is the limit I/O or CPU?
> 
> That is, which has hit 100% usage, CPU or I/O?  If is is I/O then you
> might want to look for consumers of I/O.  The best way to make I/O go
> faster is not to do it.
> 
> If the limit is CPU, then you need to devise a way to consume less CPU
> (or get a faster CPU) or to get more processors and parallelize your
> processing.
> 
> Secondly, what is the headroom you have available?  For example if you
> are using 100% CPU and 97% I/O, then you are damn close to balanced and
> the additional couple of K per second you can gain in I/O is unlikely
> to make any significant difference, and will likely end up being the
> bottleneck even if you spend a couple of million dollars on CPU and
> MEMORY (to drive CPU usage down to 0.0001% but peg I/O to 100%).
> 
> If the problem is CPU, then the first place to attack is boost:spirit
> (or use multiprocessing) -- after checking to ensure that you I/O
> system is properly offloaded from the main CPU and not consuming spin
> cycles while doing I/O, of course.
> 
> If neither I/O nor CPU is pegged at 100% then you have a crappy
> scheduler or simply insufficient overlap between I/O and compute, and
> you need to fix this first.
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to