[jira] [Assigned] (IMPALA-8339) Coordinator should be more resilient to fragment instances startup failure

Michael Ho (JIRA) Fri, 03 May 2019 11:02:01 -0700


     [ 
https://issues.apache.org/jira/browse/IMPALA-8339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Michael Ho reassigned IMPALA-8339:
----------------------------------

    Assignee: Thomas Tauber-Marshall

> Coordinator should be more resilient to fragment instances startup failure
> --------------------------------------------------------------------------
>
>                 Key: IMPALA-8339
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8339
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Distributed Exec
>            Reporter: Michael Ho
>            Assignee: Thomas Tauber-Marshall
>            Priority: Major
>              Labels: Availability, resilience
>
> Impala currently relies on statestore for cluster membership. When an Impala 
> executor goes offline, it may take a while for statestore to declare that 
> node as unavailable and for that information to be propagated to all 
> coordinator nodes. Within this window, some coordinator nodes may still 
> attempt to issue RPCs to the faulty node, resulting in RPC failures which 
> resulted in query failures. In other words, many queries may fail to start 
> within this window until all coordinator nodes get the latest information on 
> cluster membership.
> Going forward, coordinator may need to fall back to using backup executors 
> for each fragments in case some of the executors are not available. Moreover, 
> *coordinator should treat the cluster membership information from statestore 
> (or any external source of truth e.g. etcd) as hints instead of ground truth* 
> and adjust the scheduling of fragment instances based on the availability of 
> the executors from the coordinator's perspective.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-8339) Coordinator should be more resilient to fragment instances startup failure

Reply via email to