Re: [sqlite] Building multiple indexes in one scan over a table

Nico Williams Tue, 10 Jun 2014 14:40:01 -0700

On Tue, Jun 10, 2014 at 4:25 PM, Simon Slavin <slav...@bigfraud.org> wrote:
> On 10 Jun 2014, at 7:22pm, Fabian Giesen <fabi...@radgametools.com> wrote:
> Not possible in current SQLite.  No standard syntax for doing it in SQL.  A 
> quick google suggests that no other popular DBMS implements it.


If the CREATE INDEX statements are in a transaction then the RDBMS
could defer index creation until they are needed or a COMMIT is done,
then it could coalesce their execution plans to do a single scan.
Otherwise it can't be done at all indeed (since if not in a
transaction it's as if each CREATE INDEX were in its own transaction,
and that has to be durable (synchronous) when it completes.

> And ... I'm not sure it would be that much faster.  The majority of the work 
> it's doing is writing.  Reading the existing table data is fast and easy, but 
> writing new data and figuring out the pointers takes the majority of the 
> time.  Because of this I don't think it likely that this feature will be 
> added to standard SQLite.  Of course, if you really care about this you could 
> add it yourself.

How much faster it is depends on whether the table to be scanned fits
in memory, how much memory pressure there is, and so on.  So, yes, it
might improve performance, but probably not that much because yeah,
the limiting factor will be write bandwidth, and that will surely be
less than read bandwidth, so at most we're talking about an
improvement of 33%.  If you were creating 10 indexes then it'd be at
most 9/20s improvement.  Anything that approaches a 50% improvement is
not to be sneezed at, but given what a special case this is, I'd not
blame anyone for focusing on something better.

> To optimize this, when you create the table put the columns you're going to 
> index near the beginning of the rows.  SQLite stops reading the data of a 
> table row once it has all the columns it needs.  I don't know whether this 
> will make a lot of difference but it won't do any harm.

Well, no, SQLite reads page-by-page.  It may stop decoding data for
columns past the last one it needs, but that's different.

Also, let's look at the OP's indexes:

  CREATE INDEX samples_by_time ON samples(time);
  CREATE INDEX samples_by_name ON samples(name_id);
  CREATE INDEX samples_by_type ON samples(type_id);

I'm betting that type_id is a low-cardinality column.  It might be
best to try the skip-scan index optimization in recent releases and go
for just one or two indexes.

Assuming these are something like a time series, and assuming
skip-scan indexing, this might work:

CREATE INDEX samples_idx ON samples (time, name_id, type_id);

or maybe two:

CREATE INDEX samples_idx ON samples (time);
CREATE INDEX samples_idx ON samples (name_id, type_id);

> Also, store your time as a numeric value (Unix epoch or Julian day) rather 
> than a string.  Numbers are faster to handle than strings.  INTEGERs are 
> faster to handle than REALs.

I'm not so sure.  Number parsing involves multiplications.  This sort
of thing needs to be benchmarked in each use case.  Storing dates in
ISO 8601 format has its benefits anyways.

Nico
--
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Building multiple indexes in one scan over a table

Reply via email to