Massive tables - 2 Queries better than join?

Rob Emerick Sat, 11 May 2002 16:40:17 -0700

Hello,

I am trying to understand a fundamental concept here. Lets say we have two
tables:


CREATE TABLE codewords (id INT UNSIGNED not null AUTO_INCREMENT, word
VARCHAR (60) not null , fkey_humanword INT UNSIGNED not null , PRIMARY KEY
(id), INDEX (id), UNIQUE (id))

CREATE TABLE humanword (id INT UNSIGNED not null AUTO_INCREMENT, word
VARCHAR (60) not null , PRIMARY KEY (id), INDEX (id), UNIQUE (id))

A 'codeword' is put into a script, which gets the corresponding 'humanword'
and echoes it to the screen. So lets say the codeword 'bose' corresponds to
the humanword 'best speakers'. Now, I know what I am about to describe is
poor RDBMS design but it helps me understand a concept: Let's say that in
the table 'codewords' there can be multiple instances of the same codeword.
Perhaps the codeword 'bose' appears 45,000 times. However, all 45,000
instances of the codeword 'bose' have the same fkey_humanword (they
correspond to the same humanword.) Why would there be 45,000 duplicate
entries? Please just accept that for some reason all the duplicates need to
be there.

Now, lets say I wanted to display every possible humanword. I could just do
"SELECT word FROM humanword". However, lets say that I need to make the
query select from the table 'codewords'. So I would do something like

SELECT h.word FROM codewords AS c LEFT JOIN humanword AS h ON
h.id=c.fkey_humanword GROUP BY h.id
So this way it will join the two tables but only display distinct values.
Now this is joining two massive tables though, say 1 million rows each, so
would it be faster to execute two queries such as:

SELECT c.fkey_humanword FROM codewords AS c GROUP BY fkey_humanword
Then loop through each resulting fkey_humanword, store it in a variable
called '$varname', and have the second query be
SELECT w.word FROM humanword WHERE id=$varname

Everything tells me the JOIN is the best way to go. If the tables are
properly indexed it should scream. However, I am wrestling with
understanding the fundamentals of how the join with a GROUP BY clause works.
Does MySQL select a row from the first table than scan the entire second
table for the JOIN, then go on to second row of the first table and perform
the join for that row, and on and on? Or does MySQL first perform the GROUP
BY clause on the first table, take the resulting rows and than perform the
JOIN on the small amount of grouped rows left?






---------------------------------------------------------------------
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/           (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

Massive tables - 2 Queries better than join?

Reply via email to