Hi Donald, Thanks for the answer, I'll try to address these issues: 1,4) I was lazy and got the DDL from the GUI without looking at it. Here is the proper form from the master table (it shouldnt affect replicability though)
CREATE TABLE [materializedInstances] ([provider] TEXT, [instanceId] TEXT, [packageId] INTEGER, [date] TEXT, [duration] INTEGER, [transportType] INTEGER, PRIMARY KEY ([provider], [instanceId])) CREATE INDEX "1ab7a7b8a24dce53a34a650587c5bfdabcc785a2" ON [materializedInstances] ( "packageId" , "provider" , "instanceId" , "date" , "duration" , "transportType") CREATE TABLE "materializedPrices" ("id" INTEGER ,"price" FLOAT ,"perPerson" float ,"currency" TEXT ,"promotionType" INTEGER ,"promotionName" TEXT ,"provider" text ,"instanceId" TEXT ,"accomodationHash" TEXT ,"accomodationName" TEXT ,"start" DATE ,"end" DATE ,PRIMARY KEY ("id")) CREATE INDEX "ffd0d22435eccb1020821f3ec70d038dfc7e5d91" ON "materializedPrices"("provider","instanceId","start","end","accomodationHash","price") Seems my GUI does some postprocessing on the DDL to constraint + index (not sure how wrong that is). 2) As this is basically a hierarchical query it would be best suited for CTEs, but I wouldn't go there as I'm sure optimizations aren't in place for these in sqlite. The problems with joins are: 1) An optimization problem regarding WHERE and GROUP BY that seem to not be able to both use the same index in sqlite (I'll get back with that later) 2) Since this aggregates over chunks of the same table, a JOIN is always of C1*C2logN complexity while iterating with nested selects does (C1+C2)logN Example: SELECT MIN(a.d)+MIN(b.d) FROM t a JOIN t b ON a.y=b.y WHERE a.cond=1 AND b.cond=2 GROUP BY a.y Is basically squared time (looks like cubic but I think that's the planner bug I was talking about, need to reproduce a minimal test case) while SELECT (SELECT MIN(d) FROM t WHERE t.y=j.y AND t.cond=1 )+ (SELECT MIN(d) FROM t WHERE t.y=j.y AND t.cond=2 ) FROM (SELECT DISTINCT t.y) j Although much uglier, gets there in index search time. 3) Yeap. And as I said, the predicted result sizes by ANALYZE are pretty accurate, the data is pretty linearly distributed. The unsorted plan is perfect, so a wrong prediction is not at fault, but the sorted plan is completely different. Thanks, Dinu On 10.03.2015 17:17, Donald Griggs wrote: > Hi, Dinu, > > I'll have to leave it to those more knowledgeable to respond to other > aspects of your question, but I was struck by the following. > > 1) Regarding columns such as "provider TEXT(2000000000)" > > Do you really mean that values in such columns might need to contain up to > two giga-characters (2 * 10**9)? Sqlite will simply interpret this as > "TEXT" plain and simple -- but if your actual data contains > monstrously-long values, AND these are part of primary keys, then I can > imagine this would disastrously affect performance. > > 2) Complexity. > > I understand you said could not achieve good performance with > straightforward joins. I wondered if it might be helpful to post your > cleanest, simplest JOIN-based SELECT (and indexes and query plan) that you > thought should have good performance, yet does not. > > > 3) Analyze. You've likely already run the ANALYZE command, right? (A > one-time run of VACUUM wouldn't hurt either, I suppose.) > > 4) Regarding: "CREATE INDEX sqlite_autoindex_materializedInstances_1 ON > materializedInstances (provider,instanceId); > > Am I right that since (provider,instanceId) is already PRIMARY KEY, then > this index is redundant? (Even so, I would only expect it to take up disk > space and slow down inserts/deletes, but not slow down SELECTs.) > > > Donald > _______________________________________________ > sqlite-users mailing list > sqlite-users at mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users