From my experience the need for an index on a database table is more
a function of the database configuration and the current size of a
table. Many times the need has cropped up in my own webapp
development. One of the reasons why we rolled our own set of classes
is that it is easy to get to and manipulate the sql. Tuning is a fine
art, sometimes an index is indicated and other times a rewrite of the
statement with a hint so that it is joined in another manner. A third
option is found with Oracle - analyze the tables - this allows the
Database to decide how best to organize the table's index and that
can improve plans. If MySQL offers a similar feature perhaps this
will avoid the index.
Let's look at the way the SQL is constructed:
FROM roller_comment t0 INNER JOIN weblogentry t1 ON t0.entryid =
t1.id LEFT OUTER JOIN weblogcategory t2 ON t1.categoryid = t2.id LEFT
OUTER JOIN
rolleruser t8 ON t1.userid = t8.id LEFT OUTER JOIN website t9 ON
t1.websiteid =
t9.id LEFT OUTER JOIN weblogcategory t3 ON t2.parentid = t3.id LEFT
OUTER JOIN
website t4 ON t2.websiteid = t4.id LEFT OUTER JOIN weblogcategory t5 ON
t4.bloggercatid = t5.id LEFT OUTER JOIN rolleruser t6 ON t4.userid =
t6.id LEFT
OUTER JOIN weblogcategory t7 ON t4.defaultcatid = t7.id WHERE
(t1.websiteid =
'8a926693072c38bf010741c83fcb36d6' AND t0.status = 'APPROVED') ORDER BY
t0.posttime DESC LIMIT 0, 10
This rewrites as:
FROM roller_comment t0, weblogentry t1, weblogcategory t2, rolleruser
t8, website t9, weblogcategory t3, website t4, weblogcategory t5,
rolleruser t6, weblogcategory t7
WHERE t1.categoryid = t2.id AND t1.userid = t8.id AND t1.websiteid =
t9.id AND t2.parentid = t3.id AND t2.websiteid = t4.id AND
t4.bloggercatid = t5.id AND t4.userid = t6.id AND t4.defaultcatid =
t7.id AND
t1.websiteid = '8a926693072c38bf010741c83fcb36d6' AND t0.status =
'APPROVED' ORDER BY
t0.posttime DESC LIMIT 0, 10
These indexes are provided:
website.userid
weblogentry.categoryid
weblogentry.userid
weblogentry.websiteid
weblogcategory.id (primary key)
weblogcategory.parentid
weblogcategory.websiteid
rolleruser.id (primary key)
roller_comment.status
create index ws_bloggercatid_idx on website(bloggercatid);
is for "t4.bloggercatid = t5.id"
create index ws_defaultcatid_idx on website(defaultcatid );
is for "t4.defaultcatid = t7.id"
In other words every other possible index that would help make this
horrible join function except for the two missed:
My company has a blog service with 30k blogs and 70k unique
browsers per week.
This clearly indicates that without the two indexes the 30K blogs
impacts every JOIN significantly! The factor is roughly by 30,000 x
30,000 or 900,000,000. Maybe not that bad, but you can see the
EXPLOSION!
Are you convinced?
Regards,
Dave
On Mar 12, 2008, at 11:36 AM, Allen Gilliland wrote:
Yes, I understand why we use indexes, what I meant was that when
you are suggesting that we need a new index that you provide some
evidence about why that particular index is necessary.
Your example below is completely valid, but the ur_userid_idx index
has always been there, so that's not a new index.
For example, you said we need to add these 2 indexes ...
create index ws_bloggercatid_idx on website(bloggercatid);
create index ws_defaultcatid_idx on website(defaultcatid );
why? off the top of my head I can't think of a reason those
indexes would be of benefit because I don't believe we run any
standard queries which do a lookup or join on those columns. I may
be wrong, but that's why I would like some evidence before we would
just add the indexes for no reason.
-- Allen
Boris Milikič wrote:
1) I've got "Impossible WHERE noticed after reading const tables"
message, when I ran explain on sql from prevoious e-mail:
SELECT t0.id, t0.content, t0.contenttype, t0.email, t0.name,
t0.notify, t0.plugins, t0.posttime, ....
2) So I run explain on this simple query:
explain select a.rolename ,b.username from rolleruser b, userrole
a where b.id=a.userid;
3) Explanations of why indexes are necessary
Column "type" in following table is the join type. Type ref
means, that all rows with matching index values are read from this
table for each combination of rows from the previous tables,
example a).
For a tables that are completely read in sequence from the hard
drive EXPLAIN lists "ALL" in the "type" column. To the second
table in the join plan for a two table query, EXPLAIN lists type:
ALL, as for table without index, example b). In example b) when
EXPLAIN lists type: ALL for each table in a join "this output
indicates that MySQL is generating a Cartesian product of all the
tables; that is, every combination of rows" (MySQL manual). In
simpler terms: Two tables of 10 rows each joined together does not
result in 20 rows, it results in 100 rows (10 multiplied by 10).
a) with index
select_type table type possible_keys key key_len ref rows extra
1 SIMPLE b ALL PRIMARY <NULL> <NULL> <NULL> 3
1 SIMPLE a ref ur_userid_idx ur_userid_idx 144
roller.b.id 1
b) without index (drop index ur_userid_idx on userrole( userid );)
1 SIMPLE a ALL <NULL> <NULL> <NULL> <NULL> 4
1 SIMPLE b ALL PRIMARY <NULL> <NULL> <NULL> 3
Using where
Using too many indexes on tables can make things worse, as you
said. In many cases, MySQL can calculate the best possible query
plan. In very large database partitioning help.
-- Boris
-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
Sent: Friday, March 07, 2008 6:04 PM
To: [email protected]
Subject: Re: Problem with database load
Can you also provide explanations of why you think those indexes
are necessary? Having too many indexes on tables can actually
make things worse, so you don't want to just flood the db with
tons of indexes.
-- Allen
Boris Milikič wrote:
I just found one more missing:
create index ws_posttime_idx on roller_comment (posttime);
Next weekend I will walk through sql log and dbcreate.sql script
and record in JIRA if I find something.
Boris
-----Original Message-----
From: Dave [mailto:[EMAIL PROTECTED]
Sent: Friday, March 07, 2008 4:31 PM
To: [email protected]
Subject: Re: Problem with database load
On Fri, Mar 7, 2008 at 10:15 AM, Boris Milikič <[EMAIL PROTECTED]
rs.si> wrote:
Did you created missing indexes in rollerdb (4.0):
create index ws_bloggercatid_idx on website(bloggercatid);
create index ws_defaultcatid_idx on website(defaultcatid );
Thanks Boris,
I just opened an issue for this:
https://issues.apache.org/roller/browse/ROL-1687
Are there any other 4.0 database issues that you know of that are
not on record in JIRA?
- Dave