[ https://issues.apache.org/jira/browse/IMPALA-10481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17281370#comment-17281370 ]
Grant Henke edited comment on IMPALA-10481 at 2/8/21, 9:26 PM: --------------------------------------------------------------- I filed IMPALA-10197 a bit back which describes exposing the replica selection (LEADER_ONLY or CLOSEST) I think that would be a super easy implementation that would at least let users pick as needed for certain workloads. {quote}We should consider setting the LEADER_ONLY option by default for remote Kudu reads. The only concern would be that this might result in worse load balancing and hotspots, in which case Kudu might need to implement some additional connection option that provides a better mix of affinity and load balancing. {quote} I think choosing LEADER_ONLY by default wouldn't be great in environments where Kudu and Impala are co-located and we would expect local reads. Perhaps there is some option which would be LOCAL_OR_LEADER. That would select a local replica if one exists, otherwise fallback to the leader. Another replica selection which might be useful for testing is a FURTHEST option. This could force remote reads even on environments which have local options in order to performance test that code path. If LEADER_ONLY does become the default, I think KUDU-3061 would become a lot more relevant/necessary Kudu side to avoid hot spots and balance the read workload. Another Kudu side change which may be relevant to Improve Kudu's caching behavior (depending on the query type) is KUDU-613 (Scan-resistant cache replacement algorithm for the block cache). was (Author: granthenke): I filed IMPALA-10197 a bit back which describes exposing the replica selection (LEADER_ONLY or CLOSEST) I think that would be a super easy implementation that would at least let users pick as needed for certain workloads. {quote}We should consider setting the LEADER_ONLY option by default for remote Kudu reads. The only concern would be that this might result in worse load balancing and hotspots, in which case Kudu might need to implement some additional connection option that provides a better mix of affinity and load balancing. {quote} I think choosing LEADER_ONLY by default wouldn't be great in environments where Kudu and Impala are co-located and we would expect local reads. Perhaps there is some option which would be LOCAL_OR_LEADER. That would select a local replica if one exists, otherwise fallback to the leader. Another replica selection which might be useful for testing is a FURTHEST option. This could force remote reads even on environments which have local options in order to performance test that code path. Another Kudu side change which may be relevant to Improve Kudu's caching behavior (depending on the query type) is KUDU-613 (Scan-resistant cache replacement algorithm for the block cache). > Lack of TServer affinity in remote Kudu scans results in bad OS buffer cache > behavior on tablet servers > ------------------------------------------------------------------------------------------------------- > > Key: IMPALA-10481 > URL: https://issues.apache.org/jira/browse/IMPALA-10481 > Project: IMPALA > Issue Type: Bug > Components: Backend > Affects Versions: Impala 4.0 > Reporter: David Rorke > Priority: Major > Labels: kudu, performance > > Remote Kudu scans can take many iterations against the same scan range before > achieving good performance if the OS buffer cache is initially cold on the > tablet servers. The slow warmup of the buffer cache is exacerbated by the > fact that remote scans in the default Impala config choose a tablet server at > random from the replica candidates. The Kudu client supports a LEADER_ONLY > option that provides hard affinity to the leader replica, and Impala allows > this to be configured using the --pick_only_leaders_for_tests option, but > this is currently considered a testing only option and by default Impala will > connect to a random replica. > The following is a series of iterations of TPC-DS query 33 (times in > seconds), against a freshly started Kudu cluster, in 3 configurations (1) > local reads, with Impala running on Kudu cluster, (2) remote reads from > separate Impala cluster with default config, (3) remote reads with > pick_only_leaders_for_tests=true (LEADER_ONLY affinity) > > ||Config||Iteration 1||Iter 2||Iter 3||Iter 4||Iter 5||Iter 6||Iter 7||Iter > 8||Iter 9|| > |Local|111.4|14.6| | | | | | | | > |Remote (default config)|110.8|56.9|49.9|43.3|37.3|44.0|20.0|28.9|14.9| > |Remote (LEADER_ONLY)|120.1|16.2|15.7|14.2| | | | | | > With pick_only_leaders_for_tests, the remote performance improves quickly, > approaching local performance on the second iteration and warming up fully by > iteration 4. In the default config it takes 9 iterations of the query > before we see the same performance. > Running similar experiments after explicitly dropping the buffer cache on the > tablet servers confirmed that this slow warmup is caused by poor buffer cache > hit rates until the cache is fully warm. > I suspect that slow warmup isn't the only consequence of this. Caching a > given tablet in the buffer cache on multiple tablet servers increases the > overall buffer cache footprint and will increase tserver memory pressure > under load. > We should consider setting the LEADER_ONLY option by default for remote Kudu > reads. The only concern would be that this might result in worse load > balancing and hotspots, in which case Kudu might need to implement some > additional connection option that provides a better mix of affinity and load > balancing. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org