[ https://issues.apache.org/jira/browse/FLINK-30101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xintong Song reopened FLINK-30101: ---------------------------------- > Always use StandaloneClientHAServices to create RestClusterClient when > retriving a Flink on YARN cluster client > ---------------------------------------------------------------------------------------------------------------- > > Key: FLINK-30101 > URL: https://issues.apache.org/jira/browse/FLINK-30101 > Project: Flink > Issue Type: Improvement > Components: Client / Job Submission > Affects Versions: 1.16.0 > Reporter: Zhanghao Chen > Priority: Major > Fix For: 1.17.0 > > > *Problem* > Currently, the procedure of retrieving a Flink on YARN cluster client is as > follows (in YarnClusterDescriptor#retrieve method): > # Get application report from YARN > # Set rest.address & rest.port using the info from application report > # Create a new RestClusterClient using the updated configuration, will use > client HA serivce to fetch the rest.address & rest.port if HA is enabled > Here, we can see that the usage of client HA in step 3 is redundant, as we've > already got the rest.address & rest.port from YARN application report. When > ZK HA is enabled, this would take ~1.5 s to initialize client HA services and > fetch the rest IP & port. > 1.5 s can mean a lot for latency-sensitive client operations. In my company, > we use Flink client to submit short-running session jobs and e2e latency is > critical. The job submission time is around 10 s on average, and 1.5s would > mean a 15% time saving. > *Proposal* > When retrieving a Flink on YARN cluster client, use > StandaloneClientHAServices to > create RestClusterClient instead as we have pre-fetched rest.address & > rest.port from YARN application report. This is also what we did in > KubernetesClusterDescriptor. -- This message was sent by Atlassian Jira (v8.20.10#820010)