SELECT DISTINCT can be kind of slow if there are many result values, specifically if those result values include large VARCHARs. Furthermore, some database engines cannot support a SELECT DISTINCT if any LOBs are included in the result values.
I'm trying to find a general way to optimize SELECT DISTINCT since for more purposes I rarely need to decide DISTINCT on any more than a single column. For example, consider I have a tree structure where leafs or groups can have more than a single parent group (i.e. they are links). (A) SELECT DISTINCT G.oid, G.name, G.description, G.type [, ...] FROM Groupable G JOIN Link ON G.oid = Link.childId JOIN Path ON Link.parentId = Path.descendantId WHERE (G.type = LEAF AND Path.ancestorId = 12345) Something like this can be slow on a large result set with many large character columns and it does not even express clearly what I mean, because what I really mean to say is: (B) SELECT ( FOR DISTINCT ON G.oid ), G.name, G.description, G.type [, ...] FROM Groupable G JOIN Link ON G.oid = Link.childId JOIN Path ON Link.parentId = Path.descendantId WHERE (G.type = LEAF AND Path.ancestorId = 12345) Of course there is no valid SQL like this. So what I end up doing is the following: (C) SELECT G.oid, G.name, G.description, G.type [, ...] FROM Groupable G WHERE G.oid IN ( SELECT G.oid FROM Groupable G JOIN Link ON G.oid = Link.childId JOIN Path ON Link.parentId = Path.descendantId WHERE (G.type = LEAF AND Path.ancestorId = 12345)) For the majority of cases this will perform better than (A). However, sometimes the expense of the subquery will out weight the sort used for the distinct and it will not be faster. Is there a standard SQL way to avoid sorting on every field for a distinct on a unique key without having to perform a subquery? Maybe something with a group by? TIA for any ideas or thoughts... R. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]