Re: [sqlite] sqlite performance, locking & threading

Emerson Clarke Sat, 30 Dec 2006 02:41:39 -0800

Roger,

My original question was in fact not a statement.  I did not want
sqlite to work differently.  Rather the opposite, sqlite already works
differently to the way i, and probably a lot of users assume that it
would.  So all i wanted to know was why that is the case.


It seemed to me that making a library which only functioned on a per
thread basis was something that you would have to do deliberately and
by design.  But i am still vague as to what the reasons were behind
this design, aside from some mention of file locks being based on
threads not processes.

I am left to assume that all other locking mechanisms like ipc and
files have already been tried and been found wanting.  I also assume
that priority has been given to making sqlite operate across network
boundaries rather than process to process (where the locking
requirements would be quite different)

But thank you for your many suggestions and explanations of the
virtual tables functionality.  It seems quite powerful, and if i didnt
want to try to keep this as database agnostic as possible i would
probably look into that.  I especcially liked the notion that you
could keep the index as files but have it appear as a table since that
is in part what i am doing anyway in a slightly more abstract way.  I
am already using options like "synchronous=off".

I think as other users have suggested the two best options for me are
to write custom locking handlers for os.c (although i am still unsure
as to wether this will fix the problem, essentially i want to turn all
file locking off completely since the database will never be accessed
by more than one process) or to simply implement an in process server
which handles all of the sqlite functionality using message passing to
a single thread so that there is only ever one lock and one cache.

The second option is probably the one i will try first, since its more
transparent and for the cost of a single thread lets me emulate the
exact same behaviour that other databases have with regards to
transactions and threads.

Im not familiar with DB/dbm/gdbm, are any of those under a similar
license to sqlite and or as easy to use and simple ? :)

Emerson






On 12/30/06, Roger Binns <[EMAIL PROTECTED]> wrote:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Emerson Clarke wrote:
| I have deliberately tried to avoid giving too much detail on the
| architecture of the index since that was not the point and i didnt
| want to end up debating it.

I don't want to debate your index architecture either :-).  Quite simply
several times a month people post to this ideally wanting SQLite changed
to match how they want to structure things.  People on the list explore
with the poster how the items of data are related, and can suggest an
alternative way of doing things.  Usually the poster finds that simpler
than what they had first thought of and goes away happy.

Is this your question: I want SQLite to work differently than it
currently does so that it matches how I want to do things?

If that is the case, then the answer is you can go ahead and rewrite as
much of SQLite as you want to to do that.  The code is public domain so
there are no legal or technical hindrances standing in your way.  This
thread may as well end at that.

On the other hand, if you do want to work within the constraints of
SQLite then there are quite a few things that can be suggested.  But
that is only possible if more is known about the relationships of the data.

| I did make an attempt to explain that A and B could not be done at the
| same time in a previous message, but perhaps its been lost in the
| conversation.  The process involves several stages some of which are
| database operations and some of which are file operations and that the
| operations are not separable.  They must be done in sequential order.

I was trying to establish what has to be serialized.  In particular the
question was about if A and B had any relationships to each other.  If
they do, then that means they would have to be processed serially and I
don't see the relevance of threading etc.

If they can be processed at the same time, then that means some sort of
partitioning can happen.  In theory breaking the datasets into 10
partitions can give 10 times the performance, but in practise there will
need to be some coordination in order to make it look like there is one
database not multiple pieces.

| The database operations, though very small still consume the most time
| and are the most sensetive to how the synchronisation takes place and
| where the transactions are placed.

Have you considered just using plain DB/dbm/gdbm and then importing the
data on demand into SQLite?

Also a lot of the synchronisation is because SQLite makes damn sure it
doesn't lose your data.  If your documents are permanent (ie you can
access them later if need be), then you can loosen the constraints on
SQLite.  For example you could run with pragma synchronous=off and then
do a checkpoint every 100,000 documents where you close the database,
copy it to a permanent file, sync, and start again.  You could also use
a ram disk and copy to permanent storage as your checkpoint.

| I dont think custom functions are
| appropriate for what im doing and im not sure how virtual tables would
| be either, i rather suspect that would be a very complicated approach.

You can (ab)use custom functions and virtual tables to help behind the
scenes.  For example they can be used to make data sets that are
partitioned appear to be a single whole.  Another example is if you have
your database in two pieces - one that is read only with "old" data and
new one with updates.  That can again appear to the rest of the code as
one database.  Finally you can also make the functions and virtual
tables have side effects even on what appear to be read only queries.

| The schema is extemely simple, and there is barely any logic too the
| indexing process at all.

Maybe not even indexing the documents at all would work?  If you used a
virtual table, you can make it grovel through the documents on demand.
You can even build indices (in the SQL sense) which are in your own
format and performance characteristics and use those for the virtual table.

| Unfortunately i cannot do this with sqlite at the moment...

Correct.  SQLite errs on the side of being a library with no controller,
working with multiple processes and only having the lowest common
denominator operating system locking functionality available.  There are
techniques that can be used to improve concurrency.  DRH has a policy of
only using those that are at least 17 years old, otherwise there are
likely to be patent implications.  See this page for example:

~  http://www.sqlite.org/cvstrac/wiki?p=BlueSky

In summary, you can do one or more of the following:

- - Use some other database
- - Rewrite SQLite bits yourself
- - Use some sort of partitioning mechanism
- - ... which can be hidden using custom functions and virtual tables
- - Use a different storage mechanism (eg db/gdbm) with SQLite giving you
a front end (virtual tables)
- - Relax synchronisation and use a checkpointing mechanism

Roger
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFFljt9mOOfHg372QQRAhz2AKCWLXiRz3CBL1uUrf2nO0TWGSKz+gCdHhsh
W92csPBnyp1gwHyrJRerxLw=
=Y3QK
-----END PGP SIGNATURE-----

-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------


-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------

Re: [sqlite] sqlite performance, locking & threading

Reply via email to