Re: [sqlite] SELECT the "middle" of a large ordered table

Dominique Devienne Sat, 28 May 2016 01:53:15 -0700

On Fri, May 27, 2016 at 10:02 PM, Clay Gerrard <clay.gerr...@gmail.com>
wrote:


> [...] These container databases are normally small (<1-4M rows) - and you
> can

have many of them (>10M) - but depending on the usage pattern - they can
> also grow big (100+ GB) when there are many many object rows stored in a
> *single* container. [...]
> We're investigating strategies to improve the way this container
> information is stored at scale, one approach we're researching is to
> "split" the database such that objects before and after "pivot points" are
> stored in different underlying databases.
>

Basically you want partitioning or sharding, but do it yourself.
This recent thread might interest you:
http://osdir.com/ml/general/2016-05/msg34126.html

This brings us to the interesting SQL - what's the best way to ask a table
> indexed on name, what is the name (roughly) in the middle.
>

As Simon already warned you on, anything based on count() is unlikely to
scale.
As Dr Hipp said a few times, a decision on the file format a long time ago
prevent
SQLite from answering these count() questions as fast as other DBs, since
it must
actually scan the table rows (or index rows if less IO).

Also be careful with OFFSET. Must still scan all those previous rows.


> In addition to the name (and some other
> *non*-indexed metadata associated with the object) we also store a
> "deleted" flag on each row - so the index is actually "(deleted, name)".


The order of your indexed columns matters.
See https://www.sqlite.org/optoverview.html
and specifically https://www.sqlite.org/optoverview.html#skipscan

So you if you plan on doing name-based queries, having (name, deleted)
might be advantageous. And using a recent SQLite version also matters,
but also note that those may be more sensitive to ANALYZE than before,
see https://www.sqlite.org/queryplanner-ng.html for details.

I think taking a step back, and sharing more info (actual queries, schemas,
indexes used, etc...)
when SQLite doesn't scale as much as you'd like in it's "normal" 1 DB and
w/o "manual" sharding,
you may find the experts on this list to be able to squeeze enough perf
enhancement to not have
to resort to such manual sharding, by tweaking queries, indexes, schemas,
special pragmas, running ANALYZE, etc...

My $0.02. --DD
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] SELECT the "middle" of a large ordered table

Reply via email to