Hi there, I noticed some - at least for me - unexpected behavior of the
REINDEX command:

I have a DB with 7 tables, 11 indexes and a size of about 140MiB. The DB
is generated by a python script parsing an XML file. Entries are added
to the tables in arbitrary order.

After the tables have been created and filled, some UPDATE queries are
executed. One of those queries took ~36m. I tried to reduce the runtime
by using the REINDEX command, but with no success. However, if I delete
and recreate one of the indexes, using the DROP INDEX and CREATE INDEX
commands, the UPDATE query takes only ~2m.

Unfortunately I can not share the DB with you, due to legal reasons and
was also not able to create a test script to trigger this behavior.

The two tables, affected by the UPDATE query:
> CREATE TABLE `tbl1` (
>     `id` INTEGER PRIMARY KEY AUTOINCREMENT,
>     `str` TEXT NOT NULL UNIQUE COLLATE NOCASE,
>     `int` INTEGER NOT NULL,
>     `tbl2_id` INTEGER,
>     `other_str` TEXT DEFAULT NULL,
>     `other_int1` INTEGER DEFAULT NULL,
>     `other_int2` INTEGER NOT NULL DEFAULT 0,
>     FOREIGN KEY(`tbl2_id`) REFERENCES `tbl2`(`id`)
> );
> 
> CREATE TABLE `tbl2` (
>     `id` INTEGER PRIMARY KEY AUTOINCREMENT,
>     `str` TEXT NOT NULL UNIQUE COLLATE NOCASE,
>     `int` INTEGER NOT NULL
> );

Their corresponding indexes:
> CREATE INDEX `idx1` ON `tbl1` (
>     `int`
> );
> CREATE INDEX `idx2` ON `tbl1` (
>     `tbl2_id`
> );
> CREATE INDEX `idx3` ON `tbl2` (
>     `int`
> );
... filling the tables ...

Recreating index idx1 (To archive the speedup, recreation of the other
indexes is not necessary):
> DROP INDEX `idx1`;
> CREATE INDEX `idx1` ON `tbl1` (
>     `int`
> );
And the update query:
> UPDATE `tbl1`
> SET `tbl2_id` = (
>     SELECT `tbl2`.`id`
>     FROM `tbl2`
>     WHERE `tbl1`.`int` >= `tbl2`.`int`
>         AND `tbl1`.`str` LIKE '%' || `tbl2`.`str`
>     ORDER BY `tbl2`.`int` DESC
>     LIMIT 1
> );

And it's corresponding query plan:
> id    parent  notused detail> 3       0       0       SCAN TABLE tbl1
> 23    0       0       CORRELATED SCALAR SUBQUERY 1
> 29    23      0       SEARCH TABLE tbl2 USING INDEX idx3 (int<?)

tbl1 and tbl2 contain about 600k and 20k entries, respectively.

Does one have an explanation for this behavior?
Is it expected that the REINDEX command produces other results than
"manually" recreation of an index?
If yes, under which circumstances does this happen?
And is there some way to measure the "quality" of an index / if it has
been scattered?

PS: I also tried the VACUUM (combined w/ and w/o REINDEX) command to no
success.

-- 
Best regards

dirdi
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to