[ 
https://issues.apache.org/jira/browse/IMPALA-4069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16687018#comment-16687018
 ] 

Michael Ho commented on IMPALA-4069:
------------------------------------

Now that IMPALA-7213 and IMPALA-4063 are fixed, the number of connections 
should scale linearly with # of hosts instead of (# of hosts x # of query 
fragments per host). May still be good to study whether this is still an issue 
in a Kerberos enabled cluster or should we eagerly do a staggered warm up in 
the Impala cluster to pre-create the connections ?

> Introduce startup option to create and cache backend connections on startup
> ---------------------------------------------------------------------------
>
>                 Key: IMPALA-4069
>                 URL: https://issues.apache.org/jira/browse/IMPALA-4069
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Distributed Exec
>    Affects Versions: Impala 2.5.0
>            Reporter: Mostafa Mokhtar
>            Priority: Major
>              Labels: scalability
>
> Add impalad startup flag specifying the number of connections per backend to 
> create and cache. 
> After startup impala-server.backends.client-cache.total-clients should 
> reflect number of backends x cached connections per backend. 
> [~j...@cloudera.com] description of the problem
> {code}
> Internal Impala network connections between nodes for query execution are not 
> multiplexed. This means as the number of queries increase the number of 
> network connections increases between Impala executors. With higher #nodes, 
> the combination of query bursts and number of executors can lead to lots of 
> new connections attempts. For example, a query with 10+joins on a 100-node 
> cluster could require 1000+ connections simultaneously on coordinator.  When 
> the spike is too high or if there is not sufficient CPU available to handle 
> the bursts, this causes connection failures. 
> The total number of connections does not seem to be the issue, but there is 
> currently a practical limit on the number of simultaneous new concurrent 
> connection TCP request spikes at once. 
> Impala caches backend connections and reuse them later. With cache, the 
> simultaneous spikes of new connection request is only those above previous 
> established maximum.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to