Jay A. Kreibich wrote:

>> >Are you using a 32 bit or 64 bit process.
>> 
>> 32, but it does not matter to the problem.
>
>  When you give examples like "PRAGMA cache_size = 10000000;", or 10M
>  pages which would take something on the order of 15 GB of memory to
>  service with the default 1K page size, a few of us start to wonder.

Good point. But I believe you misunderstood my intention. I was not interested 
in absolute but in relative numbers, regardless of a 32 or 64 bit system. All I 
need to know is related to the amount of RAM available when the application 
starts. The aim is to use as much RAM as possible, but never more than 
reasonably available. The last bit is the difficult one.

>> >I am curious why you think memory is the bottleneck anyway!
>> 
>> It has often been pointed out on this list that inserts into
>> indexed tables (regular or FTS) run faster with a high page cache.
>> My own tests 2nd this. A few 100 MB more or less can make an
>> difference of more than 100%.
>
>  Given that the default page cache is 2000 pages, or on the order of
>  3MB, it seems that you're hitting some serious limits.  If hundreds
>  of megabytes (!) is giving you a return on the order of 2x, then there
>  is no magic bullet-- you aren't going to find a setting that suddenly
>  gives you a 10x speedup.  You're hitting diminishing returns in a
>  serious kind of way.

Some numbers: 32 bit app, inserting 6 million records into 2 tables each (text, 
fts, and blob). The database finally grows to over 6 GB in size. As the last 
step, a simple index is created on one text field.

With the default 2000 pages cache size (1 KB page size), this takes about a 
full day or more. Raising the page cache to some 180000 pages uses about 270 MB 
of memory but brings the timing down to less than one hour.

My testing shows that inserts with lots of random disk searches (indexes, fts) 
hugely benefit from a large cache size for the simple reason that it reduces 
disk IO.

>  Personally, I'd pick a number, like half your RAM size or ~1.5GB*
>  (whichever is smaller), set the page cache, and be done with it.

That's what I ended up doing. In addition, I regularly check 
sqlite3_memory_used() and reduce the page cache if it exceeds the limit. Then I 
call sqlite3_release_memory() repeatedly until the memory usage has dropped 
sufficiently.

>  It sounds like you've already found most of the other PRAGMAs that
>  are going to get you something.  You might be able to tune the size
>  of your INSERT transactions, but if you're around 1000 or so, going
>  higher isn't likely to buy you too much.

Currently I use just a single transaction for all inserts into a newly created 
database. This reduces the number of cache flushes to a single time when all 
data is inserted and just the used memory is being freed.

As another optimization option I am looking forward for the new journal pragma 
and will hopefully not need to use journal file after all.

Thanks for the feedback and my apologies for the late response,

Ralf 

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to