Joe McDonnell created IMPALA-12374: -------------------------------------- Summary: Explore optimizing re2 usage for leading / trailing ".*" Key: IMPALA-12374 URL: https://issues.apache.org/jira/browse/IMPALA-12374 Project: IMPALA Issue Type: Improvement Components: Backend Affects Versions: Impala 4.3.0 Reporter: Joe McDonnell
Abseil has some recommendations about efficiently using re2 here: [https://abseil.io/fast/21] One recommendation it has is to avoid leading / trailing .* for FullMatch(): {noformat} Using RE2::FullMatch() with leading or trailing .* is an antipattern. Instead, change it to RE2::PartialMatch() and remove the .*. RE2::PartialMatch() performs an unanchored search, so it is also necessary to anchor the regular expression (i.e. with ^ or $) to indicate that it must match at the start or end of the string.{noformat} For our slow path LIKE evaluation, we convert the LIKE to a regular expression and use FullMatch(). Our code to generate the regular expression will use leading/trailing .* and FullMatch for patterns like '%a%b%'. We could try detecting these cases and switching to PartialMatch with anchors. See the link for more details about how this works. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org