On Mon, Oct 29, 2018 at 6:38 PM Keith Medcalf <kmedc...@dessus.com> wrote:

>
> See the ext/misc/unionvtab.c extension for "reading" a bunch of databases
> as if they were a single database.
>
> https://www.sqlite.org/src/artifact/0b3173f69b8899da


Cool, indeed.
I also had a look at the CSV file extension:
https://www.sqlite.org/src/artifact?udc=1&ln=on&name=65297bcce8d5acd5
Someone has actually come up with an extension to read Parquet files:
https://cldellow.com/2018/06/22/sqlite-parquet-vtable.html

(...taking a deep breath...) Alright, I'm just gonna say it.
How {hard, stupid, useful} do you guys think it would be to write an SQLite
extension to add directory-based partitioning on top of the CSV extension
and let the OS and filesystem take care of the rest?
Something like "day=2018-10-29\source=source_a\bucket1.csv".
I've heard people call it "hive-style" partitioning.
As long as the size of each individual file remains reasonable, I might as
well be happy with CSV files and just read the whole file sequentially.

Alternatively, the same partitioning approach might be added to unionvtab,
whatever feels simpler.
I have no idea how the partitioning semantics could be specified though.

I know, that kinda brings us back to my original question at the end of
July about database sharding:
https://www.mail-archive.com/sqlite-users@mailinglists.sqlite.org/msg111250.html
Perhaps I'm just too biased towards this approach.

Thank you so much for your patience guys!
Gerlando
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to