Hi,
We encountered an issue in our system where we had two nearly identical
subqueries in a statement, which differed only in one table (both of which had
the same definition). One table had 4 rows, the other table had zero rows.
The subquery with the 4-row table ran about 10,000 times faster than the
subquery with the zero-row table. ANALYZE had been run on this schema, and the
problem existed on sqlite version 3.7.12 up through 3.8.5 (running on Mac OS
10.8), and exists whether compiling with SQLITE_ENABLE_STAT3 or not.
After much analysis, it turns out the problem was "fixed" by inserting one row
into the empty table, running ANALYZE, then removing the row. Without this
step, there was no row in sqlite_stat1, and the optimizer used assumed 100,000
rows for the empty table.
My naïve understanding of the sqlite engine would make me think that the
problem lies in not storing a stats row for zero-sized tables, resulting in the
optimizer making the unnecessary assumption.
Note that several conditions must exist to produce this perfect storm:
1. The subquery needs to contain several tables besides the empty one
2. The stats computed for the schema are as described above
3. Covering indexes exist for the correlation keys.
Due to size limits of these messages, I won't include query plans, but you can
run the reproduction steps below to get them.
==========================
=== Steps to Reproduce ===
==========================
Run the attached bash script and redirect to sqlite, like so:
./testdata.sh | sqlite3 test.db
The script generates a db with a million-row table; 150000 rows (by choosing
100/50/30 values for the i/j/k loops) is enough to illustrate the problem as
well, but the results are not as dramatic.
I realize the query in question isn't particularly well-formed; this is a
result of the way I'm generating the test data; in our data the relationships
are slightly different (such that subquery actually returns one row, not many
as in this example) - you'll just have to trust me on this, unless you want me
to submit a 56 MB file with confidential customer information to this mailing
list :). The end result is the same. The poor performance isn't as dramatic
as our production case, likely because the example tables are not as wide as
ours, so the cost of visiting rows is reduced, so I added some nonsensical
string operations to the select clause to burn cpu a bit to make the timer
results illustrative. In our code, the timing for the optimal query was "CPU
Time: user 0.000142 sys 0.000072" and the suboptimal one was "CPU Time: user
1.021453 sys 0.036893". As this subquery can be executed for each row in a
result set (which can easily have 100 rows), one second per row is quite bad.
=================
== testdata.sh ==
=================
#!/bin/bash
echo 'create table grandparent (gp_id integer not null primary key, gp_name
text not null);'
echo 'create table parent (p_id integer not null primary key, p_name text not
null, gp_id integernot null);'
echo 'create table child (c_id integer not null primary key, c_name text not
null, p_id integernot null);'
echo 'create temp table if not exists variables (name text primary key, value
text);'
for i in `seq 1 100`; do
echo insert into grandparent values \(null, \"gp $i\"\)\;
echo 'insert or replace into variables values ("gp", last_insert_rowid());'
for j in `seq 1 100`; do
echo insert into parent select null, \"gp-\" \|\| value \|\| \" p-$j\",
value from variables where name = \'gp\'\;
echo 'insert or replace into variables values ("p",
last_insert_rowid());'
for k in `seq 1 100`; do
echo insert into child select null, \"p-\" \|\| value \|\| \"
c-$k\", value from variables where name = \'p\'\;
done
done
done
echo 'create table somerows (name text not null, c_id integer not null);'
echo 'create table norows (name text not null, c_id integer not null);'
echo 'delete from somerows;'
echo 'insert into somerows values ("apple", 15001);'
echo 'insert into somerows values ("peach", 15003);'
echo 'insert into somerows values ("pear", 15005);'
echo 'insert into somerows values ("plum", 15007);'
echo 'create unique index pk_gp on grandparent (gp_id);'
echo 'create unique index pk_p on parent (p_id);'
echo 'create unique index pk_c on child (c_id);'
echo 'create unique index pk_somerows on somerows (name);'
echo 'create unique index pk_norows on norows (name);'
echo 'create unique index ak_p on parent (gp_id, p_id);'
echo 'create unique index ak_c on child (p_id, c_name);'
echo 'create unique index ak_norows on norows (c_id);'
echo 'create unique index ak_somerows on somerows (c_id);'
echo 'analyze;'
echo '.width 0 0 0 200'
echo 'explain query plan ' \
'select' \
'(select substr(replace(c_name || " aa aa aa aa aa aa " || p.p_id, "aa",
"a"), ' \
' length(replace(c_name || " aa aa aa aa aa aa " || p.p_id,
"aa", "a")) - 8, ' \
' 4) || c_name expensive_col' \
'from somerows s' \
' join child c' \
' on s.c_id = c.c_id' \
' join parent p' \
' on c.p_id = p.p_id' \
'where p.gp_id = mp.gp_id and expensive_col != "nothing")' \
'from child mc, parent mp' \
'where mc.p_id = mp.p_id and c_id = 15010;'
echo '.timer on'
echo \
'select' \
'(select substr(replace(c_name || " aa aa aa aa aa aa " || p.p_id, "aa",
"a"), ' \
' length(replace(c_name || " aa aa aa aa aa aa " || p.p_id,
"aa", "a")) - 8, ' \
' 4) || c_name expensive_col' \
'from somerows s' \
' join child c' \
' on s.c_id = c.c_id' \
' join parent p' \
' on c.p_id = p.p_id' \
'where p.gp_id = mp.gp_id and expensive_col != "nothing")' \
'from child mc, parent mp' \
'where mc.p_id = mp.p_id and c_id = 15010;'
echo '.timer off'
echo 'explain query plan ' \
'select' \
'(select substr(replace(c_name || " aa aa aa aa aa aa " || p.p_id, "aa",
"a"), ' \
' length(replace(c_name || " aa aa aa aa aa aa " || p.p_id,
"aa", "a")) - 8, ' \
' 4) || c_name expensive_col' \
'from norows n' \
' join child c' \
' on n.c_id = c.c_id' \
' join parent p' \
' on c.p_id = p.p_id' \
'where p.gp_id = mp.gp_id and expensive_col != "nothing")' \
'from child mc, parent mp' \
'where mc.p_id = mp.p_id and c_id = 15010;'
echo '.timer on'
echo \
'select' \
'(select substr(replace(c_name || " aa aa aa aa aa aa " || p.p_id, "aa",
"a"), ' \
' length(replace(c_name || " aa aa aa aa aa aa " || p.p_id,
"aa", "a")) - 8, ' \
' 4) || c_name expensive_col' \
'from norows n' \
' join child c' \
' on n.c_id = c.c_id' \
' join parent p' \
' on c.p_id = p.p_id' \
'where p.gp_id = mp.gp_id and expensive_col != "nothing")' \
'from child mc, parent mp' \
'where mc.p_id = mp.p_id and c_id = 15010;'
echo '.timer off'
echo 'insert into norows values ("dummy", 1);'
echo 'analyze;'
echo 'delete from norows;'
echo 'explain query plan ' \
'select' \
'(select substr(replace(c_name || " aa aa aa aa aa aa " || p.p_id, "aa",
"a"), ' \
' length(replace(c_name || " aa aa aa aa aa aa " || p.p_id,
"aa", "a")) - 8, ' \
' 4) || c_name expensive_col' \
'from norows n' \
' join child c' \
' on n.c_id = c.c_id' \
' join parent p' \
' on c.p_id = p.p_id' \
'where p.gp_id = mp.gp_id and expensive_col != "nothing")' \
'from child mc, parent mp' \
'where mc.p_id = mp.p_id and c_id = 15010;'
echo '.timer on'
echo \
'select' \
'(select substr(replace(c_name || " aa aa aa aa aa aa " || p.p_id, "aa",
"a"), ' \
' length(replace(c_name || " aa aa aa aa aa aa " || p.p_id,
"aa", "a")) - 8, ' \
' 4) || c_name expensive_col' \
'from norows n' \
' join child c' \
' on n.c_id = c.c_id' \
' join parent p' \
' on c.p_id = p.p_id' \
'where p.gp_id = mp.gp_id and expensive_col != "nothing")' \
'from child mc, parent mp' \
'where mc.p_id = mp.p_id and c_id = 15010;'
echo '.timer off'
echo 'analyze;'
_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users