I hit a situation just now where a select using the 'regexp' operator would have been a tidy solution if mysql used indexes. I wasn't sure if this optimisation had been implemented, so I investigated and found that indexes are not used. The case where this is possible is relatively simple to identify and presumably to implement, and should not impose much CPU load , so I thought I'd suggest it.
The following queries should give identical results: (1) SELECT * FROM mytable WHERE myfield regexp '^foo(bar|baz)bing.*' (2) SELECT * FROM mytable WHERE myfield like 'foobarbing%' OR myfield like 'foobazbing%' If an index exists for myfield, then (2) will use it, but (1) will not. Depending how ambitious one wanted to be, one could implement query (1) in a manner analagous to one of the following in terms of how mysql would conduct its search for matches. (3) SELECT * FROM mytable WHERE myfield like 'foo%' AND myfield regexp '^foo(bar|baz)bing.*' (4) SELECT * FROM mytable WHERE myfield like 'foo%' AND SUBSTRING(myfield,3) regexp '^(bar|baz)bing.*' (2) as above, where appropriate. It is only possible to make use of an index where something is known about the first few characters of the values that will match. This means that optimization should only be considered where the regexp is anchored at the beginning of the string. ie the pattern either begins with '^', or could easily be re-written so that it does (probably best to leave that to the writer of the query). Optimizing to the extent implied by sample query (3) or (4) seems like relatively low hanging fruit, and would make a good initial point to aim at initially. This would get most of the potential gain without using too much CPU or programmer time. Dropping regexp out of the picture completely is only possible in some cases, depending on what regex features are used later in the pattern. Handling query form (2) implies handling (5) as well. Recognising and handling bracketted alternatives as in (2) is plausible given access to the fully parsed regex structure. (3) or (4) are possible using only a shallow parsing of the first few characters of the pattern. This would be of substantial benefit in the case of a query like (5) SELECT * * FROM mytable WHERE myfield regexp '^(foo|bar)baz' Andrew McNaughton [EMAIL PROTECTED] [EMAIL PROTECTED] --------------------------------------------------------------------- Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail <[EMAIL PROTECTED]> To unsubscribe, e-mail <[EMAIL PROTECTED]> Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php