-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Emerson Clarke wrote: | I have deliberately tried to avoid giving too much detail on the | architecture of the index since that was not the point and i didnt | want to end up debating it.
I don't want to debate your index architecture either :-). Quite simply several times a month people post to this ideally wanting SQLite changed to match how they want to structure things. People on the list explore with the poster how the items of data are related, and can suggest an alternative way of doing things. Usually the poster finds that simpler than what they had first thought of and goes away happy. Is this your question: I want SQLite to work differently than it currently does so that it matches how I want to do things? If that is the case, then the answer is you can go ahead and rewrite as much of SQLite as you want to to do that. The code is public domain so there are no legal or technical hindrances standing in your way. This thread may as well end at that. On the other hand, if you do want to work within the constraints of SQLite then there are quite a few things that can be suggested. But that is only possible if more is known about the relationships of the data. | I did make an attempt to explain that A and B could not be done at the | same time in a previous message, but perhaps its been lost in the | conversation. The process involves several stages some of which are | database operations and some of which are file operations and that the | operations are not separable. They must be done in sequential order. I was trying to establish what has to be serialized. In particular the question was about if A and B had any relationships to each other. If they do, then that means they would have to be processed serially and I don't see the relevance of threading etc. If they can be processed at the same time, then that means some sort of partitioning can happen. In theory breaking the datasets into 10 partitions can give 10 times the performance, but in practise there will need to be some coordination in order to make it look like there is one database not multiple pieces. | The database operations, though very small still consume the most time | and are the most sensetive to how the synchronisation takes place and | where the transactions are placed. Have you considered just using plain DB/dbm/gdbm and then importing the data on demand into SQLite? Also a lot of the synchronisation is because SQLite makes damn sure it doesn't lose your data. If your documents are permanent (ie you can access them later if need be), then you can loosen the constraints on SQLite. For example you could run with pragma synchronous=off and then do a checkpoint every 100,000 documents where you close the database, copy it to a permanent file, sync, and start again. You could also use a ram disk and copy to permanent storage as your checkpoint. | I dont think custom functions are | appropriate for what im doing and im not sure how virtual tables would | be either, i rather suspect that would be a very complicated approach. You can (ab)use custom functions and virtual tables to help behind the scenes. For example they can be used to make data sets that are partitioned appear to be a single whole. Another example is if you have your database in two pieces - one that is read only with "old" data and new one with updates. That can again appear to the rest of the code as one database. Finally you can also make the functions and virtual tables have side effects even on what appear to be read only queries. | The schema is extemely simple, and there is barely any logic too the | indexing process at all. Maybe not even indexing the documents at all would work? If you used a virtual table, you can make it grovel through the documents on demand. You can even build indices (in the SQL sense) which are in your own format and performance characteristics and use those for the virtual table. | Unfortunately i cannot do this with sqlite at the moment... Correct. SQLite errs on the side of being a library with no controller, working with multiple processes and only having the lowest common denominator operating system locking functionality available. There are techniques that can be used to improve concurrency. DRH has a policy of only using those that are at least 17 years old, otherwise there are likely to be patent implications. See this page for example: ~ http://www.sqlite.org/cvstrac/wiki?p=BlueSky In summary, you can do one or more of the following: - - Use some other database - - Rewrite SQLite bits yourself - - Use some sort of partitioning mechanism - - ... which can be hidden using custom functions and virtual tables - - Use a different storage mechanism (eg db/gdbm) with SQLite giving you a front end (virtual tables) - - Relax synchronisation and use a checkpointing mechanism Roger -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFFljt9mOOfHg372QQRAhz2AKCWLXiRz3CBL1uUrf2nO0TWGSKz+gCdHhsh W92csPBnyp1gwHyrJRerxLw= =Y3QK -----END PGP SIGNATURE----- ----------------------------------------------------------------------------- To unsubscribe, send email to [EMAIL PROTECTED] -----------------------------------------------------------------------------