If Emeroson intuitively understood the essential architecture of the PC
he is using he would not be having difficulty with his concept of how to
use it. It is essentially a serial device, multi-tasking device and
parallelism in the forms of threading and multi processing is a
sophistication added with a high overhead.
I recollect an insightful CS professor impressing on his his students
the concept by explaining to them that the machines on their desks were
descended from a device invented to be a gas pump controller.
A machine designed from first principles to manage parrallel processing
would be very different.
Michael Ruck wrote:
> Hi Emerson,
>
> I just hope you don't reinvent the wheel ;) I haven't yet had the
need to
> index things the way you describe it. May be I should take that as
one of my
> next pet projects to get a handle on this type of task.
>
> The problem as I see it is basically, that any way you design this:
If the
> storage tasks take 90% of your indexing time, then any
parallelization may
> be a waste of effort. Even if you use a synchronization object you're
> essentially serializing things in a (complicated) multithreaded way...
>
> As far as static initialization: That it occurs before main() and is
out of
> your control was the point I was getting across. That's why I wrote
that
> this type of initialization should be avoided, unless there's no better
> design for it.
>
> Michael
>
> -----Ursprüngliche Nachricht-----
> Von: Emerson Clarke [mailto:[EMAIL PROTECTED]
> Gesendet: Mittwoch, 3. Januar 2007 20:31
> An: sqlite-users@sqlite.org
> Betreff: Re: [sqlite] sqlite performance, locking & threading
>
> Michael,
>
> Thanks for the advice. During the indexing process i need to select
and
> optionally insert records into a table so i cant ignore the outcomes.
>
> Basically the indexing process does compression, so for each
document it
> inserts words into a table and looks up keys. Every word in the
document
> gets swapped with a key, and new keys are inserted as needed.
>
> There are some problems with splitting the work up in a different
way as you
> suggested. I would either end up with a lot of queues or i would
have to
> stagger the work so that the entire data set gets processed in
stages which
> doesnt scale very well and isnt particularly fault tollerant. When
building
> an index, you want the structure to be built up progressively, so
that you
> can pause the process and resume it later on whilst still having useful
> results.
>
> I would be worried that in a queued design, the overhead and
bottlenecks
> caused by the buffering, message passing, and context switching
would reduce
> the performance to that of a single thread.
> Especially since the database operations represent 90% of the work,
all you
> would really be doing is attempting to serialise things in a
multithreaded
> way.
>
> Im sure having worked on multithreaded systems you appreciate that
sometimes
> simple designs are better, and i think i have a pretty good handle
on what
> it is that im trying to do.
>
> You never have control over static initialisation, it happens before
main().
> If i was writing very specific code to suit just this situation then
maybe
> as you say i wouldnt need to worry about it. But im also writing a
database
> api, and that api is used for many different things. My
considderations are
> not just for this one problem, but also for the best general way to
code the
> api so that it is safe and efficient in all circumstances. So far the
> client/server design is the only way i can achieve true thread safety.
>
> If i could work out why sqlite3_step() causes problems across multiple
> threads i could probably make things a little faster and i could do
away
> with the need for a client/server design.
>
> Emerson
>
>
> On 1/3/07, Michael Ruck <[EMAIL PROTECTED]> wrote:
>
>>Emerson,
>>
>>Now I understand your current implementation. You seemingly only
>>partially split up the work in your code. I'd schedule the database
>>operation and not wait on the outcome, but start on the next task.
>>When the database finishes and has retrieved its result, schedule some
>>work package on a third thread, which only processes the results etc.
>>Split up the work in to repetitive, non blocking tasks. Use multiple
>>queues and dedicated threads for parts of the operation or thread
pools,
>
> which process queues in parallel if possible.
>
>>From what I can tell you're already half way there.
>>
>>I still don't see your static initialization problem, but that's
>>another story. Actually I'd avoid using static initialization or
>>static (singleton) instances, unless the design really requires it.
>>Someone must control startup of the entire process, have that one
>>(probably main/WinMain) take care that the work queues are available.
>>Afterwards the order of thread starts doesn't matter... Actually it is
>>non-deterministic anyway (unless you serialize this yourself.)
>>
>>Michael
>>
>>-----Ursprüngliche Nachricht-----
>>Von: Emerson Clarke [mailto:[EMAIL PROTECTED]
>>Gesendet: Mittwoch, 3. Januar 2007 15:14
>>An: sqlite-users@sqlite.org
>>Betreff: Re: [sqlite] sqlite performance, locking & threading
>>
>>Michael,
>>
>>Im not sure that atomic operations would be a suitable alternative.
>>The reason why im using events/conditions is so that the client thread
>>blocks until the server thread has processed the query and returned
>>the result. If i did not need the result then a simple queueing
>>system with atomic operations or critical sections would be fine i
guess.
>>
>>The client thread must always block or spin until the server thread
>>has completed the query. Critical sections cant be efficiently used
>>to notify other threads of status change. I did try using critical
>>sections in this way, by spinning until the server thread takes a
>>lock, then blocking and eventually waiting for the server thread to
>>finish. But since there is no way to block the server thread when
>>there is no work to do both the client and server thread must sleep
which
>
> induces context switching anyway.
>
>>If you used atomic operations, how would you get the client thread to
>>block and the server thread to block when it is not processing ?
>>
>>Events/conditions seemed to be the best solution, the server thread
>>never runs when it doesnt need to and always wakes up when there is
>>processing to be done.
>>
>>The static initialisation problem occurs becuase the server thread
>>must be running before anything which needs to use it. If you have a
>>static instance of a class which accesses a database and it is
>>initalised before the static instance which controls the server thread,
>
> you have a problem.
>
>>It can be overcome using the initialise on first use idiom, as long as
>>your careful to protect the initalisation with atomic operations, but
>>its still a bit complicated.
>>
>>Emerson
>>
>>
>>On 1/3/07, Michael Ruck <[EMAIL PROTECTED]> wrote:
>>
>>>Hi Emerson,
>>>
>>>Another remark: On Windows using Events synchronization objects
>>>involves additional kernel context switches and thus slows you down
>>>more than necessary. I'd suggest using a queue, which makes use of
>>>the InterlockedXXX operations (I've implemented a number of those,
>>>including priority based ones - so this is possible without taking a
>>>single lock.) or to use critical sections - those only take the
>>>kernel context switch if there really is lock contention. If you can
>>>reduce the kernel context switches, you're performance will likely
>>>increase
>>
>>drastically.
>>
>>>I also don't see the static initialization problem: The queue has to
>>>be available before any thread is started. No thread has ownership
>>>of the queue, except may be the main thread.
>>>
>>>Michael
>>>
>>>
>>>-----Ursprüngliche Nachricht-----
>>>Von: Emerson Clarke [mailto:[EMAIL PROTECTED]
>>>Gesendet: Mittwoch, 3. Januar 2007 00:57
>>>An: sqlite-users@sqlite.org
>>>Betreff: Re: [sqlite] sqlite performance, locking & threading
>>>
>>>Nico,
>>>
>>>I have implemented all three strategies (thead specific connections,
>>>single connection multiple threads, and single thread server with
>>>multiple client threads).
>>>
>>>The problem with using thread specific contexts is that you cant
>>>have a single global transaction which wraps all of those contexts.
>>>So you end up having to use fine grained transactions, which
>>>decreases
>>
>>performance.
>>
>>>The single connection multiple thread alternative apparently has
>>>problems with sqlite3_step being active on more than one thread at
>>>the same moment, so cannot easily be used in a safe way. But it is
>>>by far the fastest and simplest alternative.
>>>
>>>The single thread server solution involves message passing between
>>>threads, and even when this is done optimally with condition
>>>variables (or events on
>>>windows) and blocking ive found that it results in a high number of
>>>context switches and decreased performance. It does however make a
>>>robust basis for a wrapper api, since it guarantees that things will
>>
>>always be synchronised.
>>
>>>But using this arrangement can also result in various static
>>>initialisation problems, since the single thread server must always
>>>be up and running before anything which needs to use it.
>>>
>>>Emerson
>>>
>>>On 1/2/07, Nicolas Williams <[EMAIL PROTECTED]> wrote:
>>>
>>>>On Sat, Dec 30, 2006 at 03:34:01PM +0000, Emerson Clarke wrote:
>>>>
>>>>>Technically sqlite is not thread safe. [...]
>>>>
>>>>Solaris man pages describe APIs with requirements like SQLite's as
>>>>"MT-Safe with exceptions" and the exceptions are listed in the man
>
> page.
>
>>>>That's still MT-Safe, but the caller has to play by certain rules.
>>>>
>>>>Anyways, this is silly. SQLite API is MT-Safe with one exception
>>>>and that exception is rather ordinary, common to other APIs like
>>>>it that have a context object of some sort (e.g., the MIT krb5
>>>>API), and not really a burden to the caller. In exchange for this
>>>>exception you get an implementation of the API that is lighter
>>>>weight and easier to maintain than it would have been without that
>>>>exception; a good trade-off IMO.
>>>>
>>>>Coping with this exception is easy. For example, if you have a
>>>>server app with multiple worker threads each of which needs a db
>>>>context then you could use a thread-specific key to track a
>>>>per-thread db context; use pthread_key_create(3C) to create the
>>>>key,
>>>>pthread_setspecific(3C) once per-thread to associate a new db
>>>>context with the calling thread, and pthread_getspecific(3C) to
>>>>get the calling thread's db context when you need it. If you have
>>>>a protocol where you have to step a statement over multiple
>>>>message exchanges with a client, and you don't want to have
>>>>per-client threads then get a db context per-client/exchange and
>>>>store that and a mutext in an object that represents that
>
> client/exchange. And so on.
>
>>>>Nico