Zhanghao Chen created FLINK-30101:
-------------------------------------

             Summary: YARN client should 
                 Key: FLINK-30101
                 URL: https://issues.apache.org/jira/browse/FLINK-30101
             Project: Flink
          Issue Type: Improvement
          Components: Client / Job Submission
    Affects Versions: 1.16.0
            Reporter: Zhanghao Chen
             Fix For: 1.17.0


*Problem*

Currently, the procedure of retrieving a Flink on YARN cluster client is as 
follows (in YarnClusterDescriptor#retrieve method):
 # Get application report from YARN
 # Set rest.address & rest.port using the info from application report
 # Create a new RestClusterClient using the updated configuration, will use 
client HA serivce to fetch the rest.address & rest.port if HA is enabled

Here, we can see that the usage of client HA in step 3 is redundant, as we've 
already got the rest.address & rest.port from YARN application report. When ZK 
HA is enabled, this would take ~1.5 s to initialize client HA services and 
fetch the rest IP & port. 

1.5 s can mean a lot for latency-sensitive client operations.  In my company, 
we use Flink client to submit short-running session jobs and e2e latency is 
critical. The job submission time is around 10 s on average, and 1.5s would 
mean 15% of time saving. 

*Proposal*

When retrieving a Flink on YARN cluster client, use StandaloneClientHAServices 
to
create RestClusterClient instead as we have pre-fetched rest.address & 
rest.port from YARN application report. This is also what we did in 
KubernetesClusterDescriptor.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to