We almost exclusively use virtual tables in our application, and this includes
virtual table code to access Faircom CTree files and in-memory data
dictionaries. The structure (fields, indexes) of these tables is fixed (and
identical for corresponding CTree and DD tables), with sharding achieved by
"cloning" the common structure into separate tables, with the "clone
parameters" that describe the value(s) of certain key field(s) being present in
the name. E.g. the table named customer_PA would contain only customers from
Pennsylvania. Creating a simple view (select * from customer_PA union all
customer_NY ...) has the drawback of acessing all member tables, even if the
constraints would require searching only one table. It also requires that all
tables be contained in the same database.
Our solution is a "partition" provider that knows about "member tables" and
"clone parameters" and can handle "partition constraints" as well as ordered
(merge) and unorded (sequential) retrieval. The name of the "partition" table
does not include any "clone parameters" (e.g customer).
So "SELECT * FROM customer;" will internally do "SELECT * FROM customer_NY;"
followed by "SELECT * FROM customer_PA;" because the member table has 2 entries
('customer','customer_NY'), ('customer','customer_PA').
But "SELECT * FROM customer WHERE ... state = 'NY';" would determine that the
"clone parameter" state only matches table customer_NY and therefore only query
that table.
And "SELECT * FROM customer ... ORDER BY name;" would prepare identical
statements against both tables, fetch a record from each and return the
"smaller" one (because the virtual table supports indexing by name, the
xBestIndex method can tell SQLite that it can handle this kind of query and
sets the "orderByConsumed" flag; if the ODER BY expression cannot be handled
via an index, it goes back to sequential execution and lets SQLite do the
sorting). An n-way merge is implemented as a binary tree to minimize
comparisons.
A smilar approach may be possible with native tables that reside in different
native database files (limited by the maximum number of concurrently attached
databases).
-----Ursprüngliche Nachricht-----
Von: sqlite-users [mailto:[email protected]] Im
Auftrag von Gerlando Falauto
Gesendet: Sonntag, 29. Juli 2018 10:34
An: [email protected]
Betreff: [EXTERNAL] [sqlite] Sqlite Sharding HOWTO
Hi,
I'm totally new to sqlite and I'd like to use it for some logging application
on an embedded linux-based device. Data comes from multiple (~10), similar
sources at a steady rate.
The rolling data set would be in the size of 20 GB. Such an amount of storage
would suffice to retain data from the previous week or so.
Reading the documentation https://www.sqlite.org/whentouse.html somehow
suggests the usage of sharding:
>Concurrency is also improved by "database sharding": using separate
database files for
> different subdomains. For example, the server might have a separate
SQLite database for each
> user, so that the server can handle hundreds or thousands of
> simultaneous
connections, but
> each SQLite database is only used by one connection.
In my case I would be doing sharding on the data source and/or the day of the
timestamp, so to have individual files in the size of a few hundreds MB.
This way, deleting the oldest data would be as easy as deleting the
corresponding file.
However, I did not find any reference whatsoever on sharding being available
_within_ sqlite.
Ideally, I would like to have a way of "seeing" the whole dataset with a single
query spanning all available databases.
Would that be at all feasible? I saw the "attach database" statement which
seems closely related but whose use-case I honestly don't get.
If not, is there any state-of-the-art adapter layer that would be performing
(and hide) the underlying sharding? I don't really care about query performance
(e.g. if such a global query spanning 20 different databases is indeed
performed serially, thereby take 20 times longer), I just need a way of hiding
this detail.
I saw some reference to SPHiveDB
https://www.mail-archive.com/[email protected]/msg43575.html
but the project looks stale (9 years since the last commit).
I also looked into AtomDB but it looks overly complicated for my use-case
(single, embedded server), plus it somehow requires the underlying sharding to
be totally exposed.
Any ideas?
Gerlando
_______________________________________________
sqlite-users mailing list
[email protected]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
___________________________________________
Gunter Hick | Software Engineer | Scientific Games International GmbH |
Klitschgasse 2-4, A-1130 Vienna | FN 157284 a, HG Wien, DVR: 0430013 | (O) +43
1 80100 - 0
May be privileged. May be confidential. Please delete if not the addressee.
_______________________________________________
sqlite-users mailing list
[email protected]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users