[ https://issues.apache.org/jira/browse/IMPALA-12363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17879048#comment-17879048 ]
ASF subversion and git services commented on IMPALA-12363: ---------------------------------------------------------- Commit 48ee4276be1eb278fb628a4813728134a4910b1f in impala's branch refs/heads/master from Joe McDonnell [ https://gitbox.apache.org/repos/asf?p=impala.git;h=48ee4276b ] IMPALA-12363: Upgrade RE2 to 2023-03-01 This bumps the version of re2 to 2023-03-01, which is the last release that doesn't have an Abseil dependency. The toolchain already contains a build of re2 2023-03-01, so there is no need to bump the toolchain version. This has a performance benefit for TPC-H's Q13, which uses this predicate: "o_comment not like '%special%requests%" This like predicate is complicated enough that it doesn't fit the heavily optimized paths that exist for simpler likes. Instead, this gets converted to an RE2 regex. The newer RE2 significantly improves performance of that predicate, and TPC-H Q13 gets ~9% faster. Testing: - Ran a core job - Ran a perf-AB-test Change-Id: Ic7f131102bd7590d222f22dcc412d9fd2286f006 Reviewed-on: http://gerrit.cloudera.org:8080/21712 Reviewed-by: Michael Smith <michael.sm...@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Reviewed-by: Yida Wu <wydbaggio...@gmail.com> > Upgrade re2 to version 2023-03-01 or higher > ------------------------------------------- > > Key: IMPALA-12363 > URL: https://issues.apache.org/jira/browse/IMPALA-12363 > Project: IMPALA > Issue Type: Improvement > Components: Backend > Affects Versions: Impala 4.3.0 > Reporter: Joe McDonnell > Assignee: Joe McDonnell > Priority: Major > > There has been a lot of development on google's re2 since the version that we > currently use (20190301). In a prototype using version 2023-03-01, it seems > to help TPC-H Q13, which has a "o_comment not like '%special%requests%'" > predicate: > {noformat} > (I) Improvement: TPCH(42) TPCH-Q13 [parquet / none / none] (5.26s -> 4.77s > [-9.43%]) > +---------------------+------------+----------+----------+------------+-----------+----------+----------+------------+--------+-------+--------+-----------+ > | Operator | % of Query | Avg | Base Avg | Delta(Avg) | > StdDev(%) | Max | Base Max | Delta(Max) | #Hosts | #Inst | #Rows | Est > #Rows | > +---------------------+------------+----------+----------+------------+-----------+----------+----------+------------+--------+-------+--------+-----------+ > | 03:AGGREGATE | 8.84% | 478.98ms | 503.19ms | -4.81% | > 1.74% | 642.76ms | 695.25ms | -7.55% | 3 | 15 | 6.30M | 6.22M > | > | 02:HASH JOIN | 9.35% | 506.60ms | 532.76ms | -4.91% | > 1.49% | 664.59ms | 738.50ms | -10.01% | 3 | 15 | 64.42M | 6.38M > | > | F00:EXCHANGE SENDER | 38.39% | 2.08s | 1.99s | +4.49% | > 0.87% | 2.39s | 2.28s | +4.77% | 3 | 15 | -1 | -1 > | > | 01:SCAN HDFS | 38.93% | 2.11s | 2.64s | -20.17% | > 0.88% | 2.37s | 2.99s | -20.87% | 3 | 15 | 62.32M | 6.30M > | > +---------------------+------------+----------+----------+------------+-----------+----------+----------+------------+--------+-------+--------+-----------+ > {noformat} > This is with > mt_dop=5,runtime_filter_min_size=8192,runtime_filter_max_size=2097152,max_num_runtime_filters=50,runtime_filter_wait_time_ms=10000 > . > Beyond 2023-03-01, re2 takes an Abseil dependency. It may have further > improvements (they replace some std::unordered_map structures with Abseil's > hash table). We can look into those versions, but it is a little bit more > work compared to 2023-03-01. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org