Abhishek Rawat created IMPALA-12382:
---------------------------------------

             Summary: Coordinator could schedule fragments on gracefully 
shutdown executors
                 Key: IMPALA-12382
                 URL: https://issues.apache.org/jira/browse/IMPALA-12382
             Project: IMPALA
          Issue Type: Improvement
            Reporter: Abhishek Rawat


Statestore does failure detection based on consecutive heartbeat failures. This 
is by default configured to be 10 (statestore_max_missed_heartbeats) at 1 
second intervals (statestore_heartbeat_frequency_ms). This could however take 
much longer than 10 seconds overall, especially if statestore is busy and due 
to rpc timeout duration.

In the following example it took 50 seconds for failure detection:
{code:java}
I0817 12:32:06.824721    86 statestore.cc:1157] Unable to send heartbeat 
message to subscriber 
impa...@impala-executor-001-5.impala-executor.impala-1692115218-htqx.svc.cluster.local:27010,
 received error: RPC Error: Client for 10.80.199.159:23000 hit an unexpected 
exception: No more data to read., type: 
N6apache6thrift9transport19TTransportExceptionE, rpc: 
N6impala18THeartbeatResponseE, send: done
I0817 12:32:06.824741    86 failure-detector.cc:91] 1 consecutive heartbeats 
failed for 
'impa...@impala-executor-001-5.impala-executor.impala-1692115218-htqx.svc.cluster.local:27010'.
 State is OK
.....
.....
.....
I0817 12:32:56.800251    83 statestore.cc:1157] Unable to send heartbeat 
message to subscriber 
impa...@impala-executor-001-5.impala-executor.impala-1692115218-htqx.svc.cluster.local:27010,
 received error: RPC Error: Client for 10.80.199.159:23000 hit an unexpected 
exception: No more data to read., type: 
N6apache6thrift9transport19TTransportExceptionE, rpc: 
N6impala18THeartbeatResponseE, send: done 
I0817 12:32:56.800267    83 failure-detector.cc:91] 10 consecutive heartbeats 
failed for 
'impa...@impala-executor-001-5.impala-executor.impala-1692115218-htqx.svc.cluster.local:27010'.
 State is FAILED
I0817 12:32:56.800276    83 statestore.cc:1168] Subscriber 
'impa...@impala-executor-001-5.impala-executor.impala-1692115218-htqx.svc.cluster.local:27010'
 has failed, disconnected or re-registered (last known registration ID: 
c84bf70f03acda2b:b34a812c5e96e687){code}
As a result there is a window when statestore is determining node failure and 
coordinator might schedule fragments on that particular executor(s). The exec 
RPC will fail and if transparent query retries is enabled, coordinator will 
immediately retry the query and it will fail again.

Ideally in such situations coordinator should be notified sooner about a failed 
executor. Statestore could send priority topic update to coordinator when it 
enters failure detection logic. This should reduce the chances of coordinator 
scheduling query fragment on a failed executor.

The other argument could be to tune the heartbeat frequency and interval 
parameters. But, it's hard to find configuration which works for all cases. 
And, so while the default values are reasonable, under certain conditions they 
could be unreasonable as seen in the above example.

It might make sense to especially handle the case where executors are shutdown 
gracefully and in such case statestore shouldn't do failure detection and 
instead fail these executor immediately.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to