On Mon, Sep 22, 2008 at 10:07:54AM -0400, D. Richard Hipp scratched on the wall: > I am reluctant to add to SQLite the ability to explicitly specify the > index for a query. I agree with Alex Scotti that the whole idea seems > very un-RDBMS like.
Well it is outside of the Relational Model, that's for sure. Then again, the whole concept of indexes are outside of the Relational Model. > There seems to be no standard SQL way of providing hints to the query > optimizer for which index to use. Every SQL database engine does it > differently. The MySQL approach is the simplest by far. Since the FROM clause is the only area of a SELECT statement where you're always dealing with fully-realized tables (and those are the only objects that can be indexed) it does seem like the most direct and cleanest approach. > SELECT * FROM tablex INDEX BY indexy WHERE ... ORDER BY ...; > SELECT * FROM tablex INDEX BY ROWID WHERE ... ORDER BY ...; > SELECT * FROM tablex NOT INDEXED WHERE ... ORDER BY ...; How are the last two different? It seems that NOT INDEXED would use a table scan, which is essentially walking the ROWID b-tree, while "INDEX BY ROWID" is going to use the "index" on ROWID, which isn't really an index, but the table's root b-tree-- e.g. the ROWID b-tree. Unless the "BY ROWID" is just some kind of alias for "the primary b-tree" and can be used regardless of if there is a INTEGER PRIMARY KEY or not? But wouldn't that still be the same as NOT INDEXED? Or would NOT INDEXED force a full table-scan no matter what, even if the query had something like "WHERE ROWID = 43" ? Also, reading the MySQL docs, it sounds like their syntax is only applied to JOIN operations (i.e. none of the examples above). I'm not sure that implies that it forces JOINs to be first or not (given an index or NOT given an index). Since different clauses in a SELECT may need to access data from the tables in different ways, it seems that telling the system to use or not use an index is only half the issue. You also need to be able to tell the query optimizer what you want the index used for (since, in general, the index can only be used for the first operation). For example, the system might pull out a long series of rows "in order" via an index and then join them to a much smaller data set, or it might join two large tables with a small result set via in index and then sort them. Even if you tell the optimizer to use (or not use) an index, that's only half the story if you can't tell it what order you want the operations performed in. Or are SELECT operations more or less set in a fixed order for SQLite? > I further propose that if the specified index cannot be used, the > sqlite3_prepare_v2() call will fail with an error. In other words, > the new syntax is a requirement, not a hint. Given that the SQLite query optimizer doesn't have access to a large amount of (expensive to maintain) statistics, there are always going to be limitation in what it can do. This seems like a good balance between simple system design and real-world tuning needs. -j -- Jay A. Kreibich < J A Y @ K R E I B I.C H > "Our opponent is an alien starship packed with atomic bombs. We have a protractor." "I'll go home and see if I can scrounge up a ruler and a piece of string." --from Anathem by Neal Stephenson _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users