I hit a situation just now where a select using the 'regexp' operator
would have been a tidy solution if mysql used indexes.  I wasn't sure if
this optimisation had been implemented, so I investigated and found that
indexes are not used.  The case where this is possible is relatively
simple to identify and presumably to implement, and should not impose much
CPU load , so I thought I'd suggest it.

The following queries should give identical results:

(1)     SELECT * FROM mytable WHERE myfield regexp '^foo(bar|baz)bing.*'

(2)     SELECT * FROM mytable WHERE myfield like 'foobarbing%'
                                 OR myfield like 'foobazbing%'


If an index exists for myfield, then (2) will use it, but (1) will not.

Depending how ambitious one wanted to be, one could implement query (1)
in a manner analagous to one of the following in terms of how mysql would
conduct its search for matches.

(3)     SELECT * FROM mytable WHERE myfield like 'foo%'
                                AND myfield regexp '^foo(bar|baz)bing.*'

(4)     SELECT * FROM mytable WHERE myfield like 'foo%'
                                AND SUBSTRING(myfield,3) regexp '^(bar|baz)bing.*'

(2) as above, where appropriate.

It is only possible to make use of an index where something is known about
the first few characters of the values that will match.  This means that
optimization should only be considered where the regexp is anchored at the
beginning of the string.  ie the pattern either begins with '^', or could
easily be re-written so that it does (probably best to leave that to the
writer of the query).

Optimizing to the extent implied by sample query (3) or (4) seems like
relatively low hanging fruit, and would make a good initial point to aim
at initially.  This would get most of the potential gain without using too
much CPU or programmer time.

Dropping regexp out of the picture completely is only possible in some
cases, depending on what regex features are used later in the pattern.
Handling query form (2) implies handling (5) as well.

Recognising and handling bracketted alternatives as in (2) is plausible
given access to the fully parsed regex structure.  (3) or (4) are possible
using only a shallow parsing of the first few characters of the pattern.
This would be of substantial benefit in the case of a query like

(5)     SELECT * * FROM mytable WHERE myfield regexp '^(foo|bar)baz'


Andrew McNaughton
[EMAIL PROTECTED]
[EMAIL PROTECTED]






---------------------------------------------------------------------
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/           (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

Reply via email to