[sqlite] Weird (slow) TEMP B-TREE ORDER BY

Dinu Marina Tue, 10 Mar 2015 20:40:20 +0200

Hi Donald,
Thanks for the answer, I'll try to address these issues:

1,4)
I was lazy and got the DDL from the GUI without looking at it. Here is 
the proper form from the master table (it shouldnt affect replicability 
though)


CREATE TABLE [materializedInstances] ([provider] TEXT, [instanceId] 
TEXT, [packageId] INTEGER, [date] TEXT, [duration] INTEGER, 
[transportType] INTEGER, PRIMARY KEY ([provider], [instanceId]))
CREATE INDEX "1ab7a7b8a24dce53a34a650587c5bfdabcc785a2" ON 
[materializedInstances] ( "packageId" , "provider" , "instanceId" , 
"date" , "duration" , "transportType")

CREATE TABLE "materializedPrices" ("id" INTEGER ,"price" FLOAT 
,"perPerson" float ,"currency" TEXT ,"promotionType" INTEGER 
,"promotionName" TEXT ,"provider" text ,"instanceId" TEXT 
,"accomodationHash" TEXT ,"accomodationName" TEXT ,"start" DATE ,"end" 
DATE ,PRIMARY KEY ("id"))
CREATE INDEX "ffd0d22435eccb1020821f3ec70d038dfc7e5d91" ON 
"materializedPrices"("provider","instanceId","start","end","accomodationHash","price")

Seems my GUI does some postprocessing on the DDL to constraint + index 
(not sure how wrong that is).

2) As this is basically a hierarchical query it would be best suited for 
CTEs, but I wouldn't go there as I'm sure optimizations aren't in place 
for these in sqlite. The problems with joins are:
         1) An optimization problem regarding WHERE and GROUP BY that 
seem to not be able to both use the same index in sqlite (I'll get back 
with that later)
         2) Since this aggregates over chunks of the same table, a JOIN 
is always of C1*C2logN complexity while iterating with nested selects 
does (C1+C2)logN

Example:

             SELECT
                 MIN(a.d)+MIN(b.d)
             FROM
                 t a JOIN t b ON a.y=b.y
             WHERE
                 a.cond=1 AND
                 b.cond=2
             GROUP BY
                 a.y

Is basically squared time (looks like cubic but I think that's the 
planner bug I was talking about, need to reproduce a minimal test case) 
while

             SELECT
                 (SELECT
                     MIN(d)
                 FROM
                     t
                 WHERE
                     t.y=j.y AND
                     t.cond=1
                 )+
                 (SELECT
                     MIN(d)
                 FROM
                     t
                 WHERE
                     t.y=j.y AND
                     t.cond=2
                 )
             FROM
                 (SELECT DISTINCT t.y) j

Although much uglier, gets there in index search time.

3) Yeap. And as I said, the predicted result sizes by ANALYZE are pretty 
accurate, the data is pretty linearly distributed. The unsorted plan is 
perfect, so a wrong prediction is not at fault, but the sorted plan is 
completely different.

Thanks,
Dinu

On 10.03.2015 17:17, Donald Griggs wrote:
> Hi, Dinu,
>
> I'll have to leave it to those more knowledgeable to respond to other
> aspects of your question, but I was struck by the following.
>
> 1) Regarding columns such as "provider TEXT(2000000000)"
>
> Do you really mean that values in such columns might need to contain up to
> two giga-characters (2 * 10**9)?   Sqlite will simply interpret this as
> "TEXT" plain and simple -- but if your actual data contains
> monstrously-long values, AND these are part of primary keys, then I can
> imagine this would disastrously affect performance.
>
> 2) Complexity.
>
> I understand you said could not achieve good performance with
> straightforward joins.   I wondered if it might be helpful to post your
> cleanest, simplest JOIN-based SELECT (and indexes and query plan) that you
> thought should have good performance, yet does not.
>
>
> 3) Analyze.   You've likely already run the ANALYZE command, right?  (A
> one-time run of VACUUM wouldn't hurt either, I suppose.)
>
> 4) Regarding: "CREATE INDEX sqlite_autoindex_materializedInstances_1 ON
> materializedInstances (provider,instanceId);
>
> Am I right that since (provider,instanceId) is already PRIMARY KEY, then
> this index is redundant? (Even so, I would only expect it to take up disk
> space and slow down inserts/deletes, but not slow down SELECTs.)
>
>
> Donald
> _______________________________________________
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

[sqlite] Weird (slow) TEMP B-TREE ORDER BY

Reply via email to