We almost exclusively use virtual tables in our application, and this includes virtual table code to access Faircom CTree files and in-memory data dictionaries. The structure (fields, indexes) of these tables is fixed (and identical for corresponding CTree and DD tables), with sharding achieved by "cloning" the common structure into separate tables, with the "clone parameters" that describe the value(s) of certain key field(s) being present in the name. E.g. the table named customer_PA would contain only customers from Pennsylvania. Creating a simple view (select * from customer_PA union all customer_NY ...) has the drawback of acessing all member tables, even if the constraints would require searching only one table. It also requires that all tables be contained in the same database.
Our solution is a "partition" provider that knows about "member tables" and "clone parameters" and can handle "partition constraints" as well as ordered (merge) and unorded (sequential) retrieval. The name of the "partition" table does not include any "clone parameters" (e.g customer). So "SELECT * FROM customer;" will internally do "SELECT * FROM customer_NY;" followed by "SELECT * FROM customer_PA;" because the member table has 2 entries ('customer','customer_NY'), ('customer','customer_PA'). But "SELECT * FROM customer WHERE ... state = 'NY';" would determine that the "clone parameter" state only matches table customer_NY and therefore only query that table. And "SELECT * FROM customer ... ORDER BY name;" would prepare identical statements against both tables, fetch a record from each and return the "smaller" one (because the virtual table supports indexing by name, the xBestIndex method can tell SQLite that it can handle this kind of query and sets the "orderByConsumed" flag; if the ODER BY expression cannot be handled via an index, it goes back to sequential execution and lets SQLite do the sorting). An n-way merge is implemented as a binary tree to minimize comparisons. A smilar approach may be possible with native tables that reside in different native database files (limited by the maximum number of concurrently attached databases). -----Ursprüngliche Nachricht----- Von: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im Auftrag von Gerlando Falauto Gesendet: Sonntag, 29. Juli 2018 10:34 An: sqlite-users@mailinglists.sqlite.org Betreff: [EXTERNAL] [sqlite] Sqlite Sharding HOWTO Hi, I'm totally new to sqlite and I'd like to use it for some logging application on an embedded linux-based device. Data comes from multiple (~10), similar sources at a steady rate. The rolling data set would be in the size of 20 GB. Such an amount of storage would suffice to retain data from the previous week or so. Reading the documentation https://www.sqlite.org/whentouse.html somehow suggests the usage of sharding: >Concurrency is also improved by "database sharding": using separate database files for > different subdomains. For example, the server might have a separate SQLite database for each > user, so that the server can handle hundreds or thousands of > simultaneous connections, but > each SQLite database is only used by one connection. In my case I would be doing sharding on the data source and/or the day of the timestamp, so to have individual files in the size of a few hundreds MB. This way, deleting the oldest data would be as easy as deleting the corresponding file. However, I did not find any reference whatsoever on sharding being available _within_ sqlite. Ideally, I would like to have a way of "seeing" the whole dataset with a single query spanning all available databases. Would that be at all feasible? I saw the "attach database" statement which seems closely related but whose use-case I honestly don't get. If not, is there any state-of-the-art adapter layer that would be performing (and hide) the underlying sharding? I don't really care about query performance (e.g. if such a global query spanning 20 different databases is indeed performed serially, thereby take 20 times longer), I just need a way of hiding this detail. I saw some reference to SPHiveDB https://www.mail-archive.com/sqlite-users@mailinglists.sqlite.org/msg43575.html but the project looks stale (9 years since the last commit). I also looked into AtomDB but it looks overly complicated for my use-case (single, embedded server), plus it somehow requires the underlying sharding to be totally exposed. Any ideas? Gerlando _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___________________________________________ Gunter Hick | Software Engineer | Scientific Games International GmbH | Klitschgasse 2-4, A-1130 Vienna | FN 157284 a, HG Wien, DVR: 0430013 | (O) +43 1 80100 - 0 May be privileged. May be confidential. Please delete if not the addressee. _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users