[ https://issues.apache.org/jira/browse/IMPALA-4069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16687018#comment-16687018 ]
Michael Ho commented on IMPALA-4069: ------------------------------------ Now that IMPALA-7213 and IMPALA-4063 are fixed, the number of connections should scale linearly with # of hosts instead of (# of hosts x # of query fragments per host). May still be good to study whether this is still an issue in a Kerberos enabled cluster or should we eagerly do a staggered warm up in the Impala cluster to pre-create the connections ? > Introduce startup option to create and cache backend connections on startup > --------------------------------------------------------------------------- > > Key: IMPALA-4069 > URL: https://issues.apache.org/jira/browse/IMPALA-4069 > Project: IMPALA > Issue Type: Bug > Components: Distributed Exec > Affects Versions: Impala 2.5.0 > Reporter: Mostafa Mokhtar > Priority: Major > Labels: scalability > > Add impalad startup flag specifying the number of connections per backend to > create and cache. > After startup impala-server.backends.client-cache.total-clients should > reflect number of backends x cached connections per backend. > [~j...@cloudera.com] description of the problem > {code} > Internal Impala network connections between nodes for query execution are not > multiplexed. This means as the number of queries increase the number of > network connections increases between Impala executors. With higher #nodes, > the combination of query bursts and number of executors can lead to lots of > new connections attempts. For example, a query with 10+joins on a 100-node > cluster could require 1000+ connections simultaneously on coordinator. When > the spike is too high or if there is not sufficient CPU available to handle > the bursts, this causes connection failures. > The total number of connections does not seem to be the issue, but there is > currently a practical limit on the number of simultaneous new concurrent > connection TCP request spikes at once. > Impala caches backend connections and reuse them later. With cache, the > simultaneous spikes of new connection request is only those above previous > established maximum. > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org