Interesting ... How does each thread know whether the file has been "previously 
processed" or not?

In other words, if you "get rid of" all the sqlite3 processing and replace it 
with a 5 ms sleep, does increasing the number of threads exhibit the same 
symptom?


That is if your thread code does this:

ThreadCode:
  while filename = getnextunprocessedfile()
     open the database filename
     do some stuff
     finalize all the statements
     close the database
     mark filename as being processed
  terminate cuz there is naught more to do

then replace it with this:

ThreadCode:
   while filename = getnextunprocessedfile()
     sleep 5 milliseconds
     mark filename as being processed
   terminate cuz there is naught more to do

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
lot about anticipated traffic volume.


>-----Original Message-----
>From: sqlite-users [mailto:sqlite-users-
>[email protected]] On Behalf Of Lee, Jason
>Sent: Monday, 22 April, 2019 15:26
>To: SQLite mailing list
>Subject: Re: [sqlite] Multiple Independent Database Instances
>
>I have a set of several million database files sitting on my
>filesystem. Each thread will open a previously unprocessed database
>file, do some queries, close the database, and move on to the next
>unprocessed database file.
>
>
>Jason Lee
>
>________________________________
>From: sqlite-users <[email protected]> on
>behalf of Keith Medcalf <[email protected]>
>Sent: Monday, April 22, 2019 3:13:57 PM
>To: SQLite mailing list
>Subject: Re: [sqlite] Multiple Independent Database Instances
>
>
>This is somewhat unclear.  You make two conflicting statements:
>
>"I have been testing with 16, 32, and 48 threads/databases at once
>..."
>and
>"time it takes for all of the threads to just open all (millions) of
>the databases"
>
>So, are you:
>(a) opening one independently and uniquely named database per thread
>as would be apparent from the first conflicting statement above; or,
>(b) opening the same "millions" of databases per thread as indicated
>by the second conflicting statement above
>
>?
>
>Per my testing the time taken to spin up a thread and open a database
>with a unique database name is constant and linear up to the thread
>limit of a process (about 800 threads).  This is even true if you
>execute SQL against the connection within that thread and also count
>that execution time in the "Time Taken".
>
>---
>The fact that there's a Highway to Hell but only a Stairway to Heaven
>says a lot about anticipated traffic volume.
>
>>-----Original Message-----
>>From: sqlite-users [mailto:sqlite-users-
>>[email protected]] On Behalf Of Lee, Jason
>>Sent: Monday, 22 April, 2019 14:08
>>To: SQLite mailing list
>>Subject: Re: [sqlite] Multiple Independent Database Instances
>>
>>Thanks for the quick responses!
>>
>>
>>I am on a machine with many many cores, 500GB RAM, and lots of NVMe
>>drives raided together, so the system should not be the issue. I
>have
>>been testing with 16, 32, and 48 threads/databases at once, and the
>>cumulative time it takes for all of the threads to just open all
>>(millions) of the databases goes from 1200 seconds to 2200 seconds
>to
>>3300 seconds.
>>
>>
>>As mentioned, this is likely to be something else, but I was hoping
>>that I was somehow using SQLite wrong.
>>
>>
>>Jason Lee
>>
>>________________________________
>>From: sqlite-users <[email protected]> on
>>behalf of Jens Alfke <[email protected]>
>>Sent: Monday, April 22, 2019 12:52:28 PM
>>To: SQLite mailing list
>>Subject: Re: [sqlite] Multiple Independent Database Instances
>>
>>
>>
>>> On Apr 22, 2019, at 11:39 AM, Lee, Jason <[email protected]>
>wrote:
>>>
>>> Hi. Are there any gotchas when opening multiple independent
>>databases from within one process using the C API?
>>
>>Do you mean different database files, or multiple connections to the
>>same file?
>>
>>> I am opening one database per thread in my code, and noticed that
>>sqlite3_open_v2 and sqlite3_close slow down as the number of threads
>>increase, indicating there might be some resource contention
>>somewhere, even though the databases should be independent of each
>>other.
>>
>>How many databases/threads? With huge numbers I’d expect slowdowns,
>>since you’ll be bottlenecking on I/O.
>>
>>But otherwise there shouldn’t be any gotchas. I would troubleshoot
>>this by profiling the running code to see where the time is being
>>spent. Even without knowledge of the SQLite internals, looking at
>the
>>call stacks of the hot-spots can help identify what the problem is.
>>
>>—Jens
>>_______________________________________________
>>sqlite-users mailing list
>>[email protected]
>>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>_______________________________________________
>>sqlite-users mailing list
>>[email protected]
>>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
>
>
>_______________________________________________
>sqlite-users mailing list
>[email protected]
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>_______________________________________________
>sqlite-users mailing list
>[email protected]
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



_______________________________________________
sqlite-users mailing list
[email protected]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to