Roger,

I have deliberately tried to avoid giving too much detail on the
architecture of the index since that was not the point and i didnt
want to end up debating it.

The design of the index is not the issue, suffice to say that i think
you are over complicating things. It is a desceptively simple problem,
and i really am not insisting on anything.  At the end of the day i
can just use another database, but the similicity and low maintenance
of sqlite appeals to me.

I did make an attempt to explain that A and B could not be done at the
same time in a previous message, but perhaps its been lost in the
conversation.  The process involves several stages some of which are
database operations and some of which are file operations and that the
operations are not separable.  They must be done in sequential order.

At this stage nothing is premature optimisation since i am merely
running performance tests and modelling the architecture. I can see
where the time is being spent using high resolution timers so that is
my empirical evidence.

The database operations, though very small still consume the most time
and are the most sensetive to how the synchronisation takes place and
where the transactions are placed.  I dont think custom functions are
appropriate for what im doing and im not sure how virtual tables would
be either, i rather suspect that would be a very complicated approach.
The schema is extemely simple, and there is barely any logic too the
indexing process at all.

The biggest impact on the performance is the transactions, and since
its hard to dispute that a single global transaction is the fastest
way to operate when you are using a single thread it is easy to see
that being able to use one global transaction and multiple threads
within that transaction is the simplest and easiest way to gain
perofrmance.  Unfortunately i cannot do this with sqlite at the
moment...

A few threads and two or three mutex lock/unlock statements is all it
takes with the design that i have now. The fact that each thread must
use its own connection is a hastle and does impact on the performance,
but not nearly as much as the inability to wrap the whole thing in a
single transaction.


On 12/30/06, Roger Binns <[EMAIL PROTECTED]> wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

You never answered the bit about whether indexing A can be done at the
same time as B.  (I get the feeling you have a certain design in mind
and insist that SQLite changes to meet that design, rather than change
your own design around the constraints of SQLite).

Emerson Clarke wrote:
| In general worker threads is not an efficient solution to the problem
| even if parallelism is available.  There is nothing to be gained by
| having one thread handing off A to another set of worker threads
| because you have to synchronise on whatever queue/list/pipe you use to
| transfer the information between threads anyway.

The synchronisation is only at the begining and end of a "job".  The big
efficiency gain is that you can write synchronous code in the
threads/processes which is way easier to write and less likely to have
bugs.  It is correct that you can write the same thing using a single
thread and using asynchronous I/O but that is way harder.

| So you may as well
| just have multiple threads all performing their A,B, and C operations
| in parallel and they will naturally interleave their operations based
| on the synchronisation that you use.

You haven't said why that won't work in practise.

| threads with mutexes cant do in the foreground and only introduces
| overhead and delay in the queueing process and context switching.

Sounds like premature optimization to me.  Compared to any I/O you are
going to have to do (all of which involves interrupts, DMA transfers and
context switching), other context switching is going to be negligible.

| On the second point, i cant see any reason why the VDBE design would
| be a problem for what i want to do.

Did you run explain?  Quite simply SQLite is designed to keep the
database locked for the minimum amount of time, rather than having
multiple VDBE's having shared locks and doing lots of concurrent locking
for longer periods of time.  Fundamentally changing the design of SQLite
is not a quick thing.

| Taking out a table wide lock is
| fine, so long as i can make changes to that table from multiple
| threads using the same connection and lock.

"Taking out a table lock is fine, as long as the table is not locked" :-)

If you have actual empirical evidence that SQLite is a bottleneck, then
there are two other approaches that may help:

- - Use multiple databases (eg split on first letter of document name) and
use ATTACH to have them all available at the same time

- - Use the new virtual tables feature and user defined functions.  You
can have the code consult other tables, store up information and do
almost any other workflow and locking scheme you want.  You can copy
batches of data from your virtual tables into the real ones, have
virtual functions that look in virtual tables for updates, then fallback
on original data or any other design that suits you.

Roger

-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------

Reply via email to