[ https://issues.apache.org/jira/browse/FLINK-13938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kostas Kloudas closed FLINK-13938. ---------------------------------- Fix Version/s: 1.11.0 Resolution: Implemented Merged on master with c59e8945f1493bbd1b00383ae9ba8eee47535ee4 > Use pre-uploaded libs to accelerate flink submission > ---------------------------------------------------- > > Key: FLINK-13938 > URL: https://issues.apache.org/jira/browse/FLINK-13938 > Project: Flink > Issue Type: New Feature > Components: Client / Job Submission, Deployment / YARN > Reporter: Yang Wang > Assignee: Yang Wang > Priority: Major > Labels: pull-request-available > Fix For: 1.11.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > Currently, every time we start a flink cluster, flink lib jars need to be > uploaded to hdfs and then register Yarn local resource so that it could be > downloaded to jobmanager and all taskmanager container. I think we could have > two optimizations. > # Use pre-uploaded flink binary to avoid uploading of flink system jars > # By default, the LocalResourceVisibility is APPLICATION, so they will be > downloaded only once and shared for all taskmanager containers of a same > application in the same node. However, different applications will have to > download all jars every time, including the flink-dist.jar. We could use the > yarn public cache to eliminate the unnecessary jars downloading and make > launching container faster. > > How the feature work? > * Add {{yarn.provided.lib.dirs}} to configure pre-uploaded libs, which > contain files that are useful for all the users of the platform(i.e. > different applications). > * When the Flink client wants to ship a local file, it will check the > provided libs first. If the provided libs contains a file with the same name, > the local ship files will be automatically excluded from uploading. > * These provided libs needs to be public readable and will be set with > {{PUBLIC}} visibility for local resources. So they will be cache in the nodes > and shared by different applications. > > How to use the pre-upload feature? > 1. First, upload the Flink binary to the HDFS directories > 2. Use {{yarn.provided.lib.dirs}} to specify the pre-uploaded libs > > A final submission command could be issued like following. > {code:java} > ./bin/flink run -m yarn-cluster -d \ > -yD > yarn.provided.lib.dirs=hdfs://myhdfs/flink/lib,hdfs://myhdfs/flink/plugins \ > examples/streaming/WindowJoin.jar > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)