On Sun, Nov 25, 2012 at 12:41:21PM -0700, Keith Medcalf scratched on the wall:
> > On Sunday, 25 November, 2012, 11:58, Jay A. Kreibich wrote:
> 
> >   each column is usually undesirable.  A given SELECT can usually only
> >   use one index per query (or sub-query), so it rarely makes sense to
> >   stack up the indexes... adding unused indexes only slows down
> >   insert/update/deletes, as well as makes the file size much larger.
> 
> Generally speaking indexes should be treated as a performance trade-off. 

  Ideally they're a performance gain.  That's kind of the point.  If
  they're not an overall gain, you likely shouldn't be using them.

> When you create an index (other than a UNIQUE index used to enforce a
> constraint, or an index on a parent or child key in a foreign-key
> relationship, where such an index may greatly increase INSERT or UPDATE
> performance) you are "moving" execution time from the retrieval
> processing to the maintenance processing of your application.

  Yes and no.  If you view an index as an optimization, then the idea 
  is usually to increase overall performance, so that there is net win.
  It is true that an index will (hopefully) increase the performance
  of many queries at the possible cost of additional maintenance processing,
  but the net change depends on the application's read/write ratios,
  and the maintenance patterns.

  In that sense, I'd say the indexes do not move execution *time*, so
  much as move and adjust execution *cost*.  If the application's needs
  and access patterns are such that the refactored set of costs is lower,
  the indexes are usually seen as a correct and positive optimization.
  If the new set of costs results in lower overall performance, the
  index is seen as a negative thing.

  There is also nothing special about an automatic UNIQUE index.
  The costs are the same.  The only difference is the motivation for
  creating it, accepting the write performance cost as the price of
  enforcing the constraint.  Adding a UNIQUE constraint (and the index
  that goes with it) is not an optimization, but an application requirement.
  
> When you add an index, you are (usually) optimizing retrieval and
> query operations.  The execution time saved during such query
> operations does not disappear (it is not recovered).  These processor
> cycles and I/O operations are "removed" from retrieval operations
> and "spent" when you perform updates to the database to maintain
> the indexes.

  I disagree with this idea, as it implies there is a 1:1 exchange in
  read performance gains and write performance loss.  That is very
  rarely the case.  Even the relative costs of a single read vs a
  single write are not linear or particularly related.  All of these
  things are highly dependent on the table size, the different indexes,
  and the operation being performed.
  
  It isn't about moving costs from category A to category B, it is
  about finding efficient work flows that work faster and better.  We
  already know there are some situations when an index will provide
  significant performance benefits, and other cases when the same index
  may slow things down.  A big part of what the query optimizer must do
  is identify these situations and pick the best use of the available
  resources-- in many situations that may be to ignore the index.
  That's not about shifting execution time, it is about getting rid of
  it, and picking the fastest path for the situation at hand.  The
  balance is that an index opens up more options to the query
  optimizer, but there is also an associated cost-- and that cost
  should only be paid if the index is used from time to time.

  This balance is true of all operations, not just SELECTs.  For
  example, a good index can make UPDATE and DELETE operations faster,
  just as it may make SELECTs faster.  That kind of blows the theory of
  moving execution time around, since there can be advantages in both
  types of operations.
  
  Index optimization is in no way a zero-sum game.  The bag of water is
  *not* fixed, and it is just as possible to remove water as it is to
  add it.

> >   Additionally, indexing any column (even one used heavily in a query
> >   filter) is not useful if the index doesn't reduce the working data
> >   set to ~10% of the rows.  In other words, having an index on a column
> >   that has all one value (be it NULL or anything else) accomplishes
> >   nothing but significantly slowing down queries that use the index.
> >   Same is true, for example, of a True/False column with an even
> >   distribution.  Generally, unless the index can be used to eliminate
> >   the vast majority of rows, it will slow down, rather than speed up, a
> >   query.
> 
> The exception to this is, of course, where the index created is a
> covering index because using a covering index, while it may not
> necessarily reduce the number of rows significantly, eliminates the
> accesses to the base table to retrieve data that might be being used
> in further operations.

  Hence the "generally."  Covering indexes are a somewhat unique case,
  and not every database can use them.  Normally they still don't result
  in much savings unless a noticeable percentage of rows are eliminated
  (or very expensive columns are eliminated, such as large BLOBs), but
  the break-even point is definitely lower than a normal index, with the
  worse case being no worse than a normal table scan.  All of this is
  true for read operations only, of course.  Covering indexes offer no
  advantage over normal indexes in the case of UPDATE and DELETE
  operations (and no disadvantages either; they can still be used to
  speed up these operations, just no more than a "normal" index).  And,
  of course, covering indexes often have a higher maintenance cost than
  normal indexes, since they typically contain more columns.

> Care needs to be taken to not prematurely add indexes that will add 
> maintenance cost but not significantly improve query performance (ie,
> be careful not to just add water into the bag -- the objective is to
> poke it around, not just add more water).

  If it is possible add water, it is also possible to take it away.
  As I said, this is not a zero-sum game.  The bag of water is not fixed.

  I agree that premature optimization is a bad idea-- not just with
  indexes, but with just about any aspect of software development.
  Unless you know the patterns and needs of your application, you're
  optimizing guesses, which is always a bad thing.



  In the end, a big part of the reason index optimization is so
  difficult is because it makes the cost structure much more complex.
  Adding an index is not so simple that queries are reduced by 50% and 
  insert/update/delete operations increase by 50%.  If it was that easy,
  the query optimizer would be much better at suggesting indexes (and/or
  automatically creating them, as it does create temporary indexes
  for some queries).

  Rather, the cost/benefit requirements of an index are much more
  complex, and often have to do with the number of rows being operated
  on, as well as the total number of rows in the table.  This is why
  it is important to understand how indexes work, and how they provide
  (or fail to provide) advantages to the application.



  "Using SQLite" (http://shop.oreilly.com/product/9780596521196.do)
  has a very lengthy discussion of indexes and how they work,
  specifically because it is difficult to generalize the use of
  indexes.  One must really look at each application and each database
  (and the data in that database!) with a solid knowledge of what an
  index can-- or cannot-- provide in order to find places where an
  index will make a positive difference.

   -j

-- 
Jay A. Kreibich < J A Y  @  K R E I B I.C H >

"Intelligence is like underwear: it is important that you have it,
 but showing it to the wrong people has the tendency to make them
 feel uncomfortable." -- Angela Johnson
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to