I can provide some info coming from our experience with SQLite 3.7.14: Since most SQl processing is IO bound, the "estimated cost" should be the number of disk IO operations required to retrieve the rows. The later addition of "estimated rows" reflects to the fact that some virtual table implementations might use non-disk storage (e.g. process or shared memory), where the number of IO operations is determined by the resident set and the cost of paging/swapping.
Lets say you have 10000 records of 200 bytes with 50 bytes of key overhead stored in some kind of ISAM file, and a page size of 4k. Performing a full table scan will take an estimated 10000 * 200 / 4096 ~= 489 disk accesses, whereas looking up a single record will take about 3 (50 bytes per key in a 4096 byte page gives an estimated fan out of over 100, resulting in 2 pages to read from the index and 1 for the record itself). Performing a partial index scan that returns 100 records will take 2 acesses to locate the first record, 1 more if a second index page is required and anywhere between 5 (if the records are contiguous) and 100 (if each is from a separate page) accesses to retrieve the records themselves. Regarding the UNIQUE flag, this is quite different from the number of estimated rows, which may be 0 or 1 due to rounding errors on a non-unique index (e.g. the initials of a set of 100 people has a cardinality of 26*26=676, giving an average number of 0,1479 records per index entry, but there may still be duplicates). -----Ursprüngliche Nachricht----- Von: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im Auftrag von dave Gesendet: Donnerstag, 19. Oktober 2017 18:43 An: 'SQLite mailing list' <sqlite-users@mailinglists.sqlite.org> Betreff: [EXTERNAL] [sqlite] virtual tables, and theTrue meaning of pIdxInfo->estimatedCost, pIdxInfo->estimatedRows, and pIdxInfo->idxFlags... Hi folks, I am trying to fully understand the impact and correct use of a few subtle features related to virtual tables' the xBestIndex mechanism, and their correct use. Here are my current beliefs: * pIdxInfo->estimatedCost obviously the cost of the proposed plan; a metric of the 'viscosity' of the table when traversing through xNext relative to other tables and especially to filesystem access * pIdxInfo->estimatedRows obviously the approximate number of rows that a proposed plan will return. But less obvious to me is how this materially affects the query plan, especially relative to pIdxInfo->estimatedCost and a little bit with respect to: * pIdxInfo->idxFlags when the SQLITE_INDEX_SCAN_UNIQUE is set. Isn't setting pIdxInfo->estimatedRows to 0 or 1 enough to communicate this same information? Anyway, I am dutifully setting both estimatedRows and idxFlags in cases where I have a 0-or-1-result table (I have several of these), and I am also estimatedRows to LLONG_MAX along with estimatedCost to DBL_MAX in cases where a plan can never be executed (btw I would respectfully suggest perhaps using a bit in idxFlags to communicate 'never use this plan, it will never work'). I haven't had any ill effects doing the above, but wonder if that is 'correct'. Also, it would be interesting just to know what the material effect of estimatedRows and idxFlags is, so that I can maybe use them more effectively. Any thoughts or corrections to my thinking? Thanks in advance; cheers! -dave _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___________________________________________ Gunter Hick | Software Engineer | Scientific Games International GmbH | Klitschgasse 2-4, A-1130 Vienna | FN 157284 a, HG Wien, DVR: 0430013 | (O) +43 1 80100 - 0 May be privileged. May be confidential. Please delete if not the addressee. _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users