Re: AW: [sqlite] sqlite performance, locking & threading

John Stanton Thu, 04 Jan 2007 20:41:25 -0800

Work on turning "reasonable" into "adequate" or "good" and it will helpyou get an intuitive feel for the design of programs such as yours.Then your programs will be simple, fast and robust, as Einsteincounselled - "Make it as simple a possible, but no simpler".

I also suggest that you take Niklaus Wirth's advice and when you runinto a difficulty backtrack your work and scrap everything until youreach a point where there are no problems and start again from thatpoint taking a different approach having learned a lesson from yourprevious attempt.

By the way, I doubt whether you are using a modern operating system, itis most likely to be old technology like Windows or Linux. Neithersupports much in the way of parallelism.


Emerson Clarke wrote:

John,

I have a reasonable understanding of the PC architecture, and more
appropriately the architecture which the operating system presents to
software.  The PC may be a serial device, but a modern operating
system with its multitasking shcheduler attempts to emulate a non
serial environment.  It devotes a certain amount of time to each
thread and then moves on.

Wether or not you are working on a highly pyshically parallel
architecture or not makes no difference, the design principles are the
same.  You should still build your software so that it is capable of
taking advantage of the environment that the operating system
presents.  It is the operating system you should be designing for, not
the hardware...

As it happens, the newest generation of PC's are all multi-core, and i
have been working on multi processor environments for many years.

Emerson

On 1/4/07, John Stanton <[EMAIL PROTECTED]> wrote:

If Emeroson intuitively understood the essential architecture of the PC
he is using he would not be having difficulty with his concept of how to
use it.  It is essentially a serial device, multi-tasking device and
parallelism in the forms of threading and multi processing is a
sophistication added with a high overhead.

I recollect an insightful CS professor impressing on his his students
the concept by explaining to them that the machines on their desks were
descended from a device invented to be a gas pump controller.

A machine designed from first principles to manage parrallel processing
would be very different.

Michael Ruck wrote:
> Hi Emerson,
>

> I just hope you don't reinvent the wheel ;) I haven't yet had theneed to> index things the way you describe it. May be I should take that asone of my

> next pet projects to get a handle on this type of task.
>

> The problem as I see it is basically, that any way you design this:If the> storage tasks take 90% of your indexing time, then anyparallelization may

> be a waste of effort. Even if you use a synchronization object you're
> essentially serializing things in a (complicated) multithreaded way...
>

> As far as static initialization: That it occurs before main() and isout of> your control was the point I was getting across. That's why I wrotethat

> this type of initialization should be avoided, unless there's no better
> design for it.
>
> Michael
>
> -----Ursprüngliche Nachricht-----
> Von: Emerson Clarke [mailto:[EMAIL PROTECTED]
> Gesendet: Mittwoch, 3. Januar 2007 20:31
> An: sqlite-users@sqlite.org
> Betreff: Re: [sqlite] sqlite performance, locking & threading
>
> Michael,
>

> Thanks for the advice. During the indexing process i need to selectand

> optionally insert records into a table so i cant ignore the outcomes.
>

> Basically the indexing process does compression, so for eachdocument it> inserts words into a table and looks up keys. Every word in thedocument

> gets swapped with a key, and new keys are inserted as needed.
>

> There are some problems with splitting the work up in a differentway as you> suggested. I would either end up with a lot of queues or i wouldhave to> stagger the work so that the entire data set gets processed instages which> doesnt scale very well and isnt particularly fault tollerant. Whenbuilding> an index, you want the structure to be built up progressively, sothat you

> can pause the process and resume it later on whilst still having useful
> results.
>

> I would be worried that in a queued design, the overhead andbottlenecks> caused by the buffering, message passing, and context switchingwould reduce

> the performance to that of a single thread.

> Especially since the database operations represent 90% of the work,all you> would really be doing is attempting to serialise things in amultithreaded

> way.
>

> Im sure having worked on multithreaded systems you appreciate thatsometimes> simple designs are better, and i think i have a pretty good handleon what

> it is that im trying to do.
>

> You never have control over static initialisation, it happens beforemain().> If i was writing very specific code to suit just this situation thenmaybe> as you say i wouldnt need to worry about it. But im also writing adatabase> api, and that api is used for many different things. Myconsidderations are> not just for this one problem, but also for the best general way tocode the

> api so that it is safe and efficient in all circumstances.  So far the
> client/server design is the only way i can achieve true thread safety.
>
> If i could work out why sqlite3_step() causes problems across multiple

> threads i could probably make things a little faster and i could doaway

> with the need for a client/server design.
>
> Emerson
>
>
> On 1/3/07, Michael Ruck <[EMAIL PROTECTED]> wrote:
>
>>Emerson,
>>
>>Now I understand your current implementation.  You seemingly only
>>partially split up the work in your code. I'd schedule the database
>>operation and not wait on the outcome, but start on the next task.
>>When the database finishes and has retrieved its result, schedule some
>>work package on a third thread, which only processes the results etc.
>>Split up the work in to repetitive, non blocking tasks. Use multiple

>>queues and dedicated threads for parts of the operation or threadpools,

>
> which process queues in parallel if possible.
>
>>From what I can tell you're already half way there.
>>
>>I still don't see your static initialization problem, but that's
>>another story. Actually I'd avoid using static initialization or
>>static (singleton) instances, unless the design really requires it.
>>Someone must control startup of the entire process, have that one
>>(probably main/WinMain) take care that the work queues are available.
>>Afterwards the order of thread starts doesn't matter... Actually it is
>>non-deterministic anyway (unless you serialize this yourself.)
>>
>>Michael
>>
>>-----Ursprüngliche Nachricht-----
>>Von: Emerson Clarke [mailto:[EMAIL PROTECTED]
>>Gesendet: Mittwoch, 3. Januar 2007 15:14
>>An: sqlite-users@sqlite.org
>>Betreff: Re: [sqlite] sqlite performance, locking & threading
>>
>>Michael,
>>
>>Im not sure that atomic operations would be a suitable alternative.
>>The reason why im using events/conditions is so that the client thread
>>blocks until the server thread has processed the query and returned
>>the result.  If i did not need the result then a simple queueing

>>system with atomic operations or critical sections would be fine iguess.

>>
>>The client thread must always block or spin until the server thread
>>has completed the query.  Critical sections cant be efficiently used
>>to notify other threads of status change.  I did try using critical
>>sections in this way, by spinning until the server thread takes a
>>lock, then blocking and eventually waiting for the server thread to
>>finish.  But since there is no way to block the server thread when

>>there is no work to do both the client and server thread must sleepwhich

>
> induces context switching anyway.
>
>>If you used atomic operations, how would you get the client thread to
>>block and the server thread to block when it is not processing ?
>>
>>Events/conditions seemed to be the best solution, the server thread
>>never runs when it doesnt need to and always wakes up when there is
>>processing to be done.
>>
>>The static initialisation problem occurs becuase the server thread
>>must be running before anything which needs to use it.  If you have a
>>static instance of a class which accesses a database and it is
>>initalised before the static instance which controls the server thread,
>
> you have a problem.
>
>>It can be overcome using the initialise on first use idiom, as long as
>>your careful to protect the initalisation with atomic operations, but
>>its still a bit complicated.
>>
>>Emerson
>>
>>
>>On 1/3/07, Michael Ruck <[EMAIL PROTECTED]> wrote:
>>
>>>Hi Emerson,
>>>
>>>Another remark: On Windows using Events synchronization objects
>>>involves additional kernel context switches and thus slows you down
>>>more than necessary. I'd suggest using a queue, which makes use of
>>>the InterlockedXXX operations (I've implemented a number of those,
>>>including priority based ones - so this is possible without taking a
>>>single lock.) or to use critical sections - those only take the
>>>kernel context switch if there really is lock contention. If you can
>>>reduce the kernel context switches, you're performance will likely
>>>increase
>>
>>drastically.
>>
>>>I also don't see the static initialization problem: The queue has to
>>>be available before any thread is started. No thread has ownership
>>>of the queue, except may be the main thread.
>>>
>>>Michael
>>>
>>>
>>>-----Ursprüngliche Nachricht-----
>>>Von: Emerson Clarke [mailto:[EMAIL PROTECTED]
>>>Gesendet: Mittwoch, 3. Januar 2007 00:57
>>>An: sqlite-users@sqlite.org
>>>Betreff: Re: [sqlite] sqlite performance, locking & threading
>>>
>>>Nico,
>>>
>>>I have implemented all three strategies (thead specific connections,
>>>single connection multiple threads, and single thread server with
>>>multiple client threads).
>>>
>>>The problem with using thread specific contexts is that you cant
>>>have a single global transaction which wraps all of those contexts.
>>>So you end up having to use fine grained transactions, which
>>>decreases
>>
>>performance.
>>
>>>The single connection multiple thread alternative apparently has
>>>problems with sqlite3_step being active on more than one thread at
>>>the same moment, so cannot easily be used in a safe way.  But it is
>>>by far the fastest and simplest alternative.
>>>
>>>The single thread server solution involves message passing between
>>>threads, and even when this is done optimally with condition
>>>variables (or events on
>>>windows) and blocking ive found that it results in a high number of
>>>context switches and decreased performance.  It does however make a
>>>robust basis for a wrapper api, since it guarantees that things will
>>
>>always be synchronised.
>>
>>>But using this arrangement can also result in various static
>>>initialisation problems, since the single thread server must always
>>>be up and running before anything which needs to use it.
>>>
>>>Emerson
>>>
>>>On 1/2/07, Nicolas Williams <[EMAIL PROTECTED]> wrote:
>>>
>>>>On Sat, Dec 30, 2006 at 03:34:01PM +0000, Emerson Clarke wrote:
>>>>
>>>>>Technically sqlite is not thread safe.  [...]
>>>>
>>>>Solaris man pages describe APIs with requirements like SQLite's as
>>>>"MT-Safe with exceptions" and the exceptions are listed in the man
>
> page.
>
>>>>That's still MT-Safe, but the caller has to play by certain rules.
>>>>
>>>>Anyways, this is silly.  SQLite API is MT-Safe with one exception
>>>>and that exception is rather ordinary, common to other APIs like
>>>>it that have a context object of some sort (e.g., the MIT krb5
>>>>API), and not really a burden to the caller.  In exchange for this
>>>>exception you get an implementation of the API that is lighter
>>>>weight and easier to maintain than it would have been without that
>>>>exception; a good trade-off IMO.
>>>>
>>>>Coping with this exception is easy.  For example, if you have a
>>>>server app with multiple worker threads each of which needs a db
>>>>context then you could use a thread-specific key to track a
>>>>per-thread db context; use pthread_key_create(3C) to create the
>>>>key,
>>>>pthread_setspecific(3C) once per-thread to associate a new db
>>>>context with the calling thread, and pthread_getspecific(3C) to
>>>>get the calling thread's db context when you need it.  If you have
>>>>a protocol where you have to step a statement over multiple
>>>>message exchanges with a client, and you don't want to have
>>>>per-client threads then get a db context per-client/exchange and
>>>>store that and a mutext in an object that represents that
>
> client/exchange.  And so on.
>
>>>>Nico

-----------------------------------------------------------------------------

To unsubscribe, send email to [EMAIL PROTECTED]

-----------------------------------------------------------------------------



-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------

Re: AW: [sqlite] sqlite performance, locking & threading

Reply via email to