Roger, My original question was in fact not a statement. I did not want sqlite to work differently. Rather the opposite, sqlite already works differently to the way i, and probably a lot of users assume that it would. So all i wanted to know was why that is the case.
It seemed to me that making a library which only functioned on a per thread basis was something that you would have to do deliberately and by design. But i am still vague as to what the reasons were behind this design, aside from some mention of file locks being based on threads not processes. I am left to assume that all other locking mechanisms like ipc and files have already been tried and been found wanting. I also assume that priority has been given to making sqlite operate across network boundaries rather than process to process (where the locking requirements would be quite different) But thank you for your many suggestions and explanations of the virtual tables functionality. It seems quite powerful, and if i didnt want to try to keep this as database agnostic as possible i would probably look into that. I especcially liked the notion that you could keep the index as files but have it appear as a table since that is in part what i am doing anyway in a slightly more abstract way. I am already using options like "synchronous=off". I think as other users have suggested the two best options for me are to write custom locking handlers for os.c (although i am still unsure as to wether this will fix the problem, essentially i want to turn all file locking off completely since the database will never be accessed by more than one process) or to simply implement an in process server which handles all of the sqlite functionality using message passing to a single thread so that there is only ever one lock and one cache. The second option is probably the one i will try first, since its more transparent and for the cost of a single thread lets me emulate the exact same behaviour that other databases have with regards to transactions and threads. Im not familiar with DB/dbm/gdbm, are any of those under a similar license to sqlite and or as easy to use and simple ? :) Emerson On 12/30/06, Roger Binns <[EMAIL PROTECTED]> wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Emerson Clarke wrote: | I have deliberately tried to avoid giving too much detail on the | architecture of the index since that was not the point and i didnt | want to end up debating it. I don't want to debate your index architecture either :-). Quite simply several times a month people post to this ideally wanting SQLite changed to match how they want to structure things. People on the list explore with the poster how the items of data are related, and can suggest an alternative way of doing things. Usually the poster finds that simpler than what they had first thought of and goes away happy. Is this your question: I want SQLite to work differently than it currently does so that it matches how I want to do things? If that is the case, then the answer is you can go ahead and rewrite as much of SQLite as you want to to do that. The code is public domain so there are no legal or technical hindrances standing in your way. This thread may as well end at that. On the other hand, if you do want to work within the constraints of SQLite then there are quite a few things that can be suggested. But that is only possible if more is known about the relationships of the data. | I did make an attempt to explain that A and B could not be done at the | same time in a previous message, but perhaps its been lost in the | conversation. The process involves several stages some of which are | database operations and some of which are file operations and that the | operations are not separable. They must be done in sequential order. I was trying to establish what has to be serialized. In particular the question was about if A and B had any relationships to each other. If they do, then that means they would have to be processed serially and I don't see the relevance of threading etc. If they can be processed at the same time, then that means some sort of partitioning can happen. In theory breaking the datasets into 10 partitions can give 10 times the performance, but in practise there will need to be some coordination in order to make it look like there is one database not multiple pieces. | The database operations, though very small still consume the most time | and are the most sensetive to how the synchronisation takes place and | where the transactions are placed. Have you considered just using plain DB/dbm/gdbm and then importing the data on demand into SQLite? Also a lot of the synchronisation is because SQLite makes damn sure it doesn't lose your data. If your documents are permanent (ie you can access them later if need be), then you can loosen the constraints on SQLite. For example you could run with pragma synchronous=off and then do a checkpoint every 100,000 documents where you close the database, copy it to a permanent file, sync, and start again. You could also use a ram disk and copy to permanent storage as your checkpoint. | I dont think custom functions are | appropriate for what im doing and im not sure how virtual tables would | be either, i rather suspect that would be a very complicated approach. You can (ab)use custom functions and virtual tables to help behind the scenes. For example they can be used to make data sets that are partitioned appear to be a single whole. Another example is if you have your database in two pieces - one that is read only with "old" data and new one with updates. That can again appear to the rest of the code as one database. Finally you can also make the functions and virtual tables have side effects even on what appear to be read only queries. | The schema is extemely simple, and there is barely any logic too the | indexing process at all. Maybe not even indexing the documents at all would work? If you used a virtual table, you can make it grovel through the documents on demand. You can even build indices (in the SQL sense) which are in your own format and performance characteristics and use those for the virtual table. | Unfortunately i cannot do this with sqlite at the moment... Correct. SQLite errs on the side of being a library with no controller, working with multiple processes and only having the lowest common denominator operating system locking functionality available. There are techniques that can be used to improve concurrency. DRH has a policy of only using those that are at least 17 years old, otherwise there are likely to be patent implications. See this page for example: ~ http://www.sqlite.org/cvstrac/wiki?p=BlueSky In summary, you can do one or more of the following: - - Use some other database - - Rewrite SQLite bits yourself - - Use some sort of partitioning mechanism - - ... which can be hidden using custom functions and virtual tables - - Use a different storage mechanism (eg db/gdbm) with SQLite giving you a front end (virtual tables) - - Relax synchronisation and use a checkpointing mechanism Roger -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFFljt9mOOfHg372QQRAhz2AKCWLXiRz3CBL1uUrf2nO0TWGSKz+gCdHhsh W92csPBnyp1gwHyrJRerxLw= =Y3QK -----END PGP SIGNATURE----- ----------------------------------------------------------------------------- To unsubscribe, send email to [EMAIL PROTECTED] -----------------------------------------------------------------------------
----------------------------------------------------------------------------- To unsubscribe, send email to [EMAIL PROTECTED] -----------------------------------------------------------------------------