Re: [sqlite] sqlite performance, locking & threadin

Ken Fri, 29 Dec 2006 09:18:28 -0800

 Emerson,
 
    You just need to lock the entire transaction using a mutex before hand. 
That means each thread will have its own sqlite cache.  Reread the sqlite 
locking and concurrency guide, you'll see that SQLITE is NOT a transaction 
based system such as Postgress/mysql/ oracle. It locks the entire database file 
at the point of writing, not at the first insert/update/delete. Its better to 
think of sqlite as an easy way to perform data access against a FLAT file. 
Rather than a transaction oriented system.
 
 Regarding my test case. You missed the point, there is a select statement in 
the middle that each thread executes. The nice thing is that each and every 
thread uses a shared cache and can execute the select statement concurrently 
without a mutex. The locking when using a shared cache works at the table level 
instead of file!
 
 Another thing you should look at: Create a master DB, then attach the 
individual databases to the master. That way you only need one Cache. How do 
you know you wouldn't benefit from merging your caches? Have you tried? Do you 
have test cases to prove that it doesn't help?
 
 You stated that you want to put your owner locking mutex wrapper around the 
database interactions so that there would be only 1 thread access the DB.. That 
means that you need to release the lock after each 
statement(insert/update/delete) and not perform a BEGIN transaction/commit... 
Which means very slow performance. And besides only one thread regardless could 
be accessing the DB. Whereas the test_server.c  architecture (you need to 
modify this for your own needs) will let you create multiple shared connections 
to the database. And each thread can perform selects concurrently. But all 
threads are serialized into the single thread for writing. This takes care of 
all locking issues and is technically the same as mutexing the writes across 
threads. And it addresses writer starvation, which you have not addressed with 
your simple mutex locking.
 
 It seems to me you might want write your own locking implementation or even 
abandoning sqlite if it doesn't fit your needs. 
 
 
Emerson Clarke <[EMAIL PROTECTED]> wrote: Ken,


Thanks i understand your example well.

What im looking for is the ability to have multiple threads perform a
write operation based on my mutex, not some internal sqlite write
mutex.  If i am managing the concurrency and performing correct
synchronisation, why can i not have multiple threads writing to the
database at once and achive a better overall performance.
    
Given that the indexing process i am referring too has several steps
which it must perform in successive order, and not all of them are
database writes, i am simply trying to use threads to emulate a
situation where there is only one thread doing all the database
writes, if that makes sense.

So in this case, what ever synchronisation sqlite is doing internally
is actually getting in the way of what i am trying to do.

There were no gains in performance in your test case becuase there was
no reason to have multiple threads.  The only thing each thread was
doing was writing to the database, so of course the only thing you
will introduce by using multiple threads is overhead.

Emerson

On 12/28/06, Ken  wrote:
> Emerson,
>
> Is the Database on the same disk as the rest of the file operations? If so is 
> it possible that you are I/O bound and causing seek issues due to i/o access 
> patterns?
>
> Take a look at the test_server.c code in the sqlite/src directory. I used 
> that as a basis to build a custom library that opens a single DB and then 
> allows multiple threads to access. The nice thing about this architecture is 
> that the threads will all get to write and no writer starvation. But all 
> write operations an single threaded.
>
>  The test code I ran creates any number of threads and performs the following 
> in each thread:
>
>  outer loop 1- 10
>      begin txn
>      loop  1 -1000
>           insert record (using modulo for data so data is unique amongst 
> threads)
>      end loop
>      commit
>
>      prepare statement
>      loop 1 - 1000
>          Select data (using modulo)
>      end loop
>       close statement
>
>       begin transaction
>       loop 1 - 1000
>           delete data, using same modulo
>       end loop
>  end main loop
>
>  timinng (seconds)          Thread count
>  1.665                                    1                   (transaction 
> size is 1000)
>  1.635                                    2                   (transcaction 
> size is 500)
>  3.094                                   4                    ( txn size is 
> 250 )
>  5.571                                   8                    (txn size is 
> 125 )
>  7.822                                16                    (txn size is 62.5)
>
> so as the number of threads increase the overall time it takes to 
> insert/select/delete a fixed set of data increases using this architecture. 
> This is because all threads are serialized upon inserts/deletes and are 
> contending on a single writer mutex. So in this particular case fewer threads 
> actually improves performance.
>
>  Hope this helps,
>  Ken
>
>
>
>
> Emerson Clarke  wrote: Roger,
>
> Thanks for the suggestions.  I think using a worker thread and a queue
> would be equivalent to just running a single thread since it
> effectively makes the database operations synchronous.  Although i can
> see what your driving at regarding the transactions every n records.
>
> The idea is that because i am accessing two databases, and doing
> several file system operations per document, there should be a large
> gain by using many threads.  There is no actual indexing process, the
> whole structure is the index, but if anything the database operations
> take the most time.  The filesystem operations have a very small
> amount of overhead.
>
> I have tried the page size pragma setting already, though i read that
> it is dependent on the cluster size of the particular filesystem that
> you are running on.
>
> Since i only have one connection to each database from each thread i
> dont think i would benefit from the caching.  Im not quite sure why
> you would ever have more than one connection to the database from a
> single thread ?  The api that i use more or less ensures that under
> most circumstances there is only one connection.
>
> Emerson
>
> -----------------------------------------------------------------------------
> To unsubscribe, send email to [EMAIL PROTECTED]
> -----------------------------------------------------------------------------
>
>
>
>

-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------

Re: [sqlite] sqlite performance, locking & threadin

Reply via email to