SELECT DISTINCT Optimizations

Robert DiFalco Mon, 06 Mar 2006 16:39:53 -0800

SELECT DISTINCT can be kind of slow if there are many result values,
specifically if those result values include large VARCHARs. Furthermore,
some database engines cannot support a SELECT DISTINCT if any LOBs are
included in the result values.


I'm trying to find a general way to optimize SELECT DISTINCT since for
more purposes I rarely need to decide DISTINCT on any more than a single
column.

For example, consider I have a tree structure where leafs or groups can
have more than a single parent group (i.e. they are links).

   (A)
   SELECT DISTINCT G.oid, G.name, G.description, G.type [, ...]
   FROM Groupable G
      JOIN Link ON G.oid = Link.childId
      JOIN Path ON Link.parentId = Path.descendantId
   WHERE (G.type = LEAF AND Path.ancestorId = 12345)

Something like this can be slow on a large result set with many large
character columns and it does not even express clearly what I mean,
because what I really mean to say is:

   (B)
   SELECT ( FOR DISTINCT ON G.oid ), G.name, G.description, G.type [,
...]
   FROM Groupable G
      JOIN Link ON G.oid = Link.childId
      JOIN Path ON Link.parentId = Path.descendantId
   WHERE (G.type = LEAF AND Path.ancestorId = 12345)

Of course there is no valid SQL like this. So what I end up doing is the
following:

   (C)
   SELECT G.oid, G.name, G.description, G.type [, ...]
   FROM Groupable G
   WHERE G.oid IN (
      SELECT G.oid
      FROM Groupable G
         JOIN Link ON G.oid = Link.childId
         JOIN Path ON Link.parentId = Path.descendantId
      WHERE (G.type = LEAF AND Path.ancestorId = 12345))

For the majority of cases this will perform better than (A). However,
sometimes the expense of the subquery will out weight the sort used for
the distinct and it will not be faster.

Is there a standard SQL way to avoid sorting on every field for a
distinct on a unique key without having to perform a subquery? Maybe
something with a group by?

TIA for any ideas or thoughts...

R.


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

SELECT DISTINCT Optimizations

Reply via email to