Thanks everyone for your responses.
 
We modified the sqlite findReusableFd function call to return 0 immediately 
(i.e. no file descriptor to reuse) and it significantly improved the 
performance. The time to open 20K db's went down from 75 secs to 24 secs. The 
modification to findReusableFd works for us because we don't plan on opening 
multiple connections to the same database from one process.
 
This does suggest that we should re-evaluate our design and not open as many 
databases from a single process. The original motivation was to limit the size 
the database for performance reasons, which resulted in a large number of 
database files. I think we are going to try creating multiple tables per 
database and evaluate its performance.
 
Are there any general guidelines on a tested limit on the number of tables in a 
database or a general rule of thumb on the maximum recommended size for a 
database file?

From: Pavel Ivanov <[email protected]>
>To: General Discussion of SQLite Database <[email protected]>
>Sent: Friday, September 2, 2011 9:28 AM
>Subject: Re: [sqlite] sqlite3_open_v2 performance degrades as number of opens 
>increase
>
>> Does anyone know why the performance degrades this way and what can be done
>> to resolve the problem?
>
>Do you have by any chance shared cache turned on? I believe when
>shared cache is used SQLite searches through all open databases to
>understand if the one you want to open is already opened and available
>for cache reuse.
>
>Also I believe there's some similar search happens through all open
>file descriptors in unix VFS. So that if several file descriptors to
>the same file are open there was no issues with dropping advisory
>locks on one descriptor while second is closing. And data below seem
>to prove that this is the problem for you.
>
>> I don't see the times you're seeing using this program...
>> time ./db 5000
>> 1.602s
>>
>> time ./db 10000
>> 5.357s
>>
>> time ./db 15000
>> 11.280s
>>
>> time ./db 20000
>> 19.570s
>>
>> time ./db 25000
>> 28.166s
>
>Michael,
>So in your test first 5000 are opened in 1.6 seconds and last 5000 are
>opened in 8.6 seconds. I'd say it's a noticeable increase that can't
>be attributed to OS.
>
>> Each sample counts as 0.01 seconds.
>>  %   cumulative   self              self     total
>>  time   seconds   seconds    calls  us/call  us/call  name
>>  52.06      0.76     0.76     5000   152.00   152.00  findInodeInfo
>>  43.84      1.40     0.64     5000   128.00   128.00  findReusableFd
>
>That's exactly what I talked about above. Roughly speaking
>findReusableFd searches through all open file descriptors to
>understand if the process already have some for the file you want to
>open. Or more precisely it looks for fd for the same file which is
>still open when corresponding sqlite3* connection is already closed
>(and yes if you open 5 connections to the same file and then close 4
>of them SQLite will still keep 5 file descriptors open to avoid
>problems with POSIX advisory locks).
>
>
>Pavel
>
>
>On Fri, Sep 2, 2011 at 10:08 AM, Black, Michael (IS)
><[email protected]> wrote:
>> I profiled the sqlite3 test program below and I get this for 25,000 files
>>
>>
>>
>> Flat profile:
>>
>> Each sample counts as 0.01 seconds.
>>  %   cumulative   self              self     total
>>  time   seconds   seconds    calls  ms/call  ms/call  name
>>  61.15     17.61    17.61    25000     0.70     0.70  findInodeInfo
>>  36.67     28.17    10.56    25000     0.42     0.42  findReusableFd
>>
>> For 5,000 files looks like this:
>>
>> Each sample counts as 0.01 seconds.
>>  %   cumulative   self              self     total
>>  time   seconds   seconds    calls  us/call  us/call  name
>>  52.06      0.76     0.76     5000   152.00   152.00  findInodeInfo
>>  43.84      1.40     0.64     5000   128.00   128.00  findReusableFd
>>
>>
>>
>> Michael D. Black
>>
>> Senior Scientist
>>
>> NG Information Systems
>>
>> Advanced Analytics Directorate
>>
>>
>>
>> ________________________________
>> From: [email protected] [[email protected]] on 
>> behalf of Black, Michael (IS) [[email protected]]
>> Sent: Friday, September 02, 2011 8:00 AM
>> To: General Discussion of SQLite Database
>> Subject: EXT :Re: [sqlite] sqlite3_open_v2 performance degrades as number of 
>> opens increase
>>
>> I assume you've overridden the system default for 1024 files in ulimit for # 
>> of open files?
>>
>> I don't see the times you're seeing using this program...though my databases 
>> are empty which probably makes a difference.
>> I do see the gradual increase in time...I think this is almost all due to 
>> the OS when you try and open thousands of files in one process.
>>
>> I ran this once to create 25,000 databases
>> ./db 25000
>> ls -l file* | wc -l
>> 25000
>> The tested opening them
>> time ./db 5000
>> 1.602s
>>
>> time ./db 10000
>> 5.357s
>>
>> time ./db 15000
>> 11.280s
>>
>> time ./db 20000
>> 19.570s
>>
>> time ./db 25000
>> 28.166s
>>
>> #include <stdio.h>
>> #include <stdlib.h>
>> #include <string.h>
>> #include "sqlite3.h"
>>
>> int main(int argc,char *argv[])
>> {
>>  int i,rc;
>>  sqlite3 *db;
>>  for(i=0;i<atoi(argv[1]);++i) {
>>    char name[4096];
>>    sprintf(name,"file%d",i);
>>    
>> rc=sqlite3_open_v2(name,&db,SQLITE_OPEN_READWRITE|SQLITE_OPEN_CREATE|SQLITE_OPEN_NOMUTEX,NULL);
>>    if (rc != SQLITE_OK) {
>>        printf("%s",sqlite3_errmsg(db));
>>        exit(1);
>>    }
>>    //sqlite3_close(db); // if you close the database it runs a LOT faster.  
>> So I don't think it's sqlite3 causing it.
>>  }
>>  return 0;
>> }
>>
>> If you run this you'll see similar behavior
>> #include <stdio.h>
>> #include <stdlib.h>
>> #include <string.h>
>> #include <unistd.h>
>> #include <sys/types.h>
>> #include <sys/stat.h>
>> #include <fcntl.h>
>>
>> int main(int argc,char *argv[])
>> {
>>  int i;
>>  int fd;
>>  for(i=0;i<atoi(argv[1]);++i) {
>>    char name[4096];
>>    sprintf(name,"%dfile",i);
>>    fd=open(name,O_RDWR|O_CREAT);
>>    if (fd==-1) perror("open error:");
>>    write(fd,name,strlen(name));
>>    //close(fd); // again it runs a lot faster if you close the file 
>> descriptor each time -- so it's the OS slowing you down.
>>    fsync(fd);
>>  }
>>  return 0;
>> }
>>
>>
>>
>>
>> Michael D. Black
>> Senior Scientist
>> NG Information Systems
>> Advanced Analytics Directorate
>>
>>
>>
>>
>> From: [email protected] [[email protected]] on 
>> behalf of Terry Cumaranatunge [[email protected]]
>> Sent: Friday, September 02, 2011 6:30 AM
>> To: [email protected]
>> Subject: EXT :[sqlite] sqlite3_open_v2 performance degrades as number of 
>> opens increase
>>
>>
>> Hello,
>>
>> We have an application that creates many small databases (over 100K) to be
>> able to control the size of the database and provide more deterministic
>> performance. At process startup, it opens many of these databases in a loop
>> to keep them open for faster transaction response times. The behavior we are
>> noticing is that the it takes progressively a longer time for each
>> sqlite3_open_v2 to complete as the number of databases kept
>> opened increases. These are some of the measurements:
>>
>> 5000 DBs = 3 secs
>> 10000 DBs = 11 secs
>> 15000 DBs = 35 secs
>> 20000 DBs = 75 secs
>>
>> Many processes can concurrently open 5000 db's at the same time and it takes
>> about the same time as a single process doing the work. So, it doesn't
>> appear to be related to the OS related issue with the number of opens.
>>
>> The open is done as:
>> sqlite3_open_v2(dbName, db, SQLITE_OPEN_READWRITE  | SQLITE_OPEN_NOMUTEX,
>> NULL)
>>
>> We are using RHEL 6 with SQLite 3.7.7.1. Each database was created with
>> journaling mode set to WAL.
>> Does anyone know why the performance degrades this way and what can be done
>> to resolve the problem?
>> _______________________________________________
>> sqlite-users mailing list
>> [email protected]
>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>> _______________________________________________
>> sqlite-users mailing list
>> [email protected]
>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>> _______________________________________________
>> sqlite-users mailing list
>> [email protected]
>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>
>_______________________________________________
>sqlite-users mailing list
>[email protected]
>http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
>
>
_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to