Re: [sqlite] Specifing which index to use. Was: Performance/bug in multikey 'group by' in 3.6.2

Jay A. Kreibich Mon, 22 Sep 2008 08:18:23 -0700

On Mon, Sep 22, 2008 at 10:07:54AM -0400, D. Richard Hipp scratched on the wall:
> I am reluctant to add to SQLite the ability to explicitly specify the  
> index for a query.  I agree with Alex Scotti that the whole idea seems  
> very un-RDBMS like.


  Well it is outside of the Relational Model, that's for sure.
  
  Then again, the whole concept of indexes are outside of the
  Relational Model.

> There seems to be no standard SQL way of providing hints to the query  
> optimizer for which index to use.   Every SQL database engine does it  
> differently.  The MySQL approach is the simplest by far. 

  Since the FROM clause is the only area of a SELECT statement where
  you're always dealing with fully-realized tables (and those are the
  only objects that can be indexed) it does seem like the most direct
  and cleanest approach.

>       SELECT * FROM tablex INDEX BY indexy WHERE ... ORDER BY ...;
>       SELECT * FROM tablex INDEX BY ROWID WHERE ... ORDER BY ...;
>       SELECT * FROM tablex NOT INDEXED WHERE ... ORDER BY ...;

  How are the last two different?
  
  It seems that NOT INDEXED would use a table scan, which is essentially
  walking the ROWID b-tree, while "INDEX BY ROWID" is going to use the
  "index" on ROWID, which isn't really an index, but the table's root
  b-tree-- e.g. the ROWID b-tree.

  Unless the "BY ROWID" is just some kind of alias for "the primary b-tree"
  and can be used regardless of if there is a INTEGER PRIMARY KEY or not?
  But wouldn't that still be the same as NOT INDEXED?

  Or would NOT INDEXED force a full table-scan no matter what, even if
  the query had something like "WHERE ROWID = 43" ?



  Also, reading the MySQL docs, it sounds like their syntax is only
  applied to JOIN operations (i.e. none of the examples above).  I'm
  not sure that implies that it forces JOINs to be first or not (given
  an index or NOT given an index).  Since different clauses in a SELECT
  may need to access data from the tables in different ways, it seems
  that telling the system to use or not use an index is only half the
  issue.  You also need to be able to tell the query optimizer what
  you want the index used for (since, in general, the index can only
  be used for the first operation).

  For example, the system might pull out a long series of rows "in
  order" via an index and then join them to a much smaller data set, or
  it might join two large tables with a small result set via in index
  and then sort them.  Even if you tell the optimizer to use (or not
  use) an index, that's only half the story if you can't tell it what
  order you want the operations performed in.

  Or are SELECT operations more or less set in a fixed order for SQLite?

> I further propose that if the specified index cannot be used, the  
> sqlite3_prepare_v2() call will fail with an error.  In other words,  
> the new syntax is a requirement, not a hint.

  Given that the SQLite query optimizer doesn't have access to a large
  amount of (expensive to maintain) statistics, there are always going
  to be limitation in what it can do.  This seems like a good balance
  between simple system design and real-world tuning needs.

   -j

-- 
Jay A. Kreibich < J A Y  @  K R E I B I.C H >

"Our opponent is an alien starship packed with atomic bombs.  We have
 a protractor."   "I'll go home and see if I can scrounge up a ruler
 and a piece of string."  --from Anathem by Neal Stephenson
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Specifing which index to use. Was: Performance/bug in multikey 'group by' in 3.6.2

Reply via email to