Hi Yang,

Name filtering & schema special handling makes sense for me. We can enrich
later if there is requirement without breaking interface.

For #1, from my perspective your first proposal is

  having an option specifies remote flink/lib, then we turn off auto
uploading local flink/lib and register that path as local resources

It seems we here add another special logic for handling one kind of
things...what I propose is we do these two steps explicitly separated:

1. an option turns off auto uploading local flink/lib
2. a general option register remote files as local resources

The rest thing here is that you propose we handle flink/lib as PUBLIC
visibility while other files as APPLICATION visibility, whether a
composite configuration or name filtering to special handle libs makes
sense though.

YarnClusterDescriptor already has a lot of special handling logics which
introduce a number of config options and keys, which should
have been configured in few of common options and validated at the runtime.

Best,
tison.


Yang Wang <danrtsey...@gmail.com> 于2020年4月17日周五 下午11:42写道:

> Hi tison,
>
> For #3, if you mean registering remote HDFS file as local resource, we
> should make the "-yt/--yarnship"
> to support remote directory. I think it is the right direction.
>
> For #1, if the users could ship remote directory, then they could also
> specify like this
> "-yt hdfs://hdpdev/flink/release/flink-1.x,
> hdfs://hdpdev/user/someone/mylib". Do you mean we add an
> option for whether trying to avoid unnecessary uploading? Maybe we could
> filter by names and file size.
> I think this is a good suggestion, and we do not need to introduce a new
> config option "-ypl".
>
> For #2, for flink-dist, the #1 could already solve the problem. We do not
> need to support remote schema.
> It will confuse the users when we only support HDFS, not S3, OSS, etc.
>
>
> Best,
> Yang
>
> tison <wander4...@gmail.com> 于2020年4月17日周五 下午8:05写道:
>
>> Hi Yang,
>>
>> I agree that these two of works would benefit from single assignee. My
>> concern is as below
>>
>> 1. Both share libs & remote flink dist/libs are remote ship files. I
>> don't think we have to implement multiple codepath/configuration.
>> 2. So, for concept clarification, there are
>>   (1) an option to disable shipping local libs
>>   (2) flink-dist supports multiple schema at least we said "hdfs://"
>>   (3) an option for registering remote shipfiles with path & visibility.
>> I think new configuration system helps.
>>
>> the reason we have to special handling (2) instead of including it in (3)
>> is because when shipping flink-dist to TM container, we specially
>> detect flink-dist. Of course we can merge it into general ship files and
>> validate shipfiles finally contain flink-dist, which is an alternative.
>>
>> The *most important* difference is (1) and (3) which we don't have an
>> option for only remote libs. Is this clarification satisfy your proposal?
>>
>> Best,
>> tison.
>>
>>
>> Till Rohrmann <trohrm...@apache.org> 于2020年4月17日周五 下午7:49写道:
>>
>>> Hi Yang,
>>>
>>> from what I understand it sounds reasonable to me. Could you sync with
>>> Tison on FLINK-14964 on how to proceed. I'm not super deep into these
>>> issues but they seem to be somewhat related and Tison already did some
>>> implementation work.
>>>
>>> I'd say it be awesome if we could include this kind of improvement into
>>> the release.
>>>
>>> Cheers,
>>> Till
>>>
>>> On Thu, Apr 16, 2020 at 4:43 AM Yang Wang <danrtsey...@gmail.com> wrote:
>>>
>>>> Hi All, thanks a lot for reviving this discussion.
>>>>
>>>> I think we could unify the FLINK-13938 and FLINK-14964 since they have
>>>> the similar
>>>> purpose, avoid unnecessary uploading and downloading jars in YARN
>>>> deployment.
>>>> The difference is FLINK-13938 aims to support the flink system lib
>>>> directory only, while
>>>> FLINK-14964 is trying to support arbitrary pre-uloaded jars(including
>>>> user and system jars).
>>>>
>>>>
>>>> So i suggest to do this feature as following.
>>>> 1. Upload the flink lib directory or users to hdfs, e.g.
>>>> "hdfs://hdpdev/flink/release/flink-1.x"
>>>> "hdfs://hdpdev/user/someone/mylib"
>>>> 2. Use the -ypl argument to specify the shared lib, multiple
>>>> directories could be specified
>>>> 3. YarnClusterDescriptor will use the pre-uploaded jars to avoid
>>>> unnecessary uploading,
>>>> both for system and user jars
>>>> 4. YarnClusterDescriptor needs to set the system jars to public
>>>> visibility so that the distributed
>>>> cache in the YARN nodemanager could be reused by multiple applications.
>>>> This is to avoid
>>>> unnecessary downloading, especially for the "flink-dist-*.jar". For the
>>>> user shared lib, the
>>>> visibility is still set to "APPLICATION" level.
>>>>
>>>>
>>>> For our past internal use case, the shared lib could help with
>>>> accelerating the submission a lot.
>>>> Also it helps to reduce the pressure of HDFS when we want to launch
>>>> many applications together.
>>>>
>>>> @tison @Till Rohrmann <trohrm...@apache.org> @Hailu, Andreas
>>>> <andreas.ha...@gs.com> If you guys thinks the suggestion makes sense. I
>>>> will try to find some time to work on this and hope it could catch up
>>>> with release-1.1 cycle.
>>>>
>>>>
>>>> Best,
>>>> Yang
>>>>
>>>> Hailu, Andreas [Engineering] <andreas.ha...@gs.com> 于2020年4月16日周四
>>>> 上午8:47写道:
>>>>
>>>>> Okay, I’ll continue to watch the JIRAs. Thanks for the update, Till.
>>>>>
>>>>>
>>>>>
>>>>> *// *ah
>>>>>
>>>>>
>>>>>
>>>>> *From:* Till Rohrmann <trohrm...@apache.org>
>>>>> *Sent:* Wednesday, April 15, 2020 10:51 AM
>>>>> *To:* Hailu, Andreas [Engineering] <andreas.ha...@ny.email.gs.com>
>>>>> *Cc:* Yang Wang <danrtsey...@gmail.com>; tison <wander4...@gmail.com>;
>>>>> user@flink.apache.org
>>>>> *Subject:* Re: Flink Conf "yarn.flink-dist-jar" Question
>>>>>
>>>>>
>>>>>
>>>>> Hi Andreas,
>>>>>
>>>>>
>>>>>
>>>>> it looks as if FLINK-13938 and FLINK-14964 won't make it into the
>>>>> 1.10.1 release because the community is about to start the release 
>>>>> process.
>>>>> Since FLINK-13938 is a new feature it will be shipped with a major 
>>>>> release.
>>>>> There is still a bit of time until the 1.11 feature freeze and if Yang 
>>>>> Wang
>>>>> has time to finish this PR, then we could ship it.
>>>>>
>>>>>
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Till
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Apr 15, 2020 at 3:23 PM Hailu, Andreas [Engineering] <
>>>>> andreas.ha...@gs.com> wrote:
>>>>>
>>>>> Yang, Tison,
>>>>>
>>>>>
>>>>>
>>>>> Do we know when some solution for 13938 and 14964 will arrive? Do you
>>>>> think it will be in a 1.10.x version?
>>>>>
>>>>>
>>>>>
>>>>> *// *ah
>>>>>
>>>>>
>>>>>
>>>>> *From:* Hailu, Andreas [Engineering]
>>>>> *Sent:* Friday, March 20, 2020 9:19 AM
>>>>> *To:* 'Yang Wang' <danrtsey...@gmail.com>
>>>>> *Cc:* tison <wander4...@gmail.com>; user@flink.apache.org
>>>>> *Subject:* RE: Flink Conf "yarn.flink-dist-jar" Question
>>>>>
>>>>>
>>>>>
>>>>> Hi Yang,
>>>>>
>>>>>
>>>>>
>>>>> This is good to know. As a stopgap measure until a solution between
>>>>> 13938 and 14964 arrives, we can automate the application staging directory
>>>>> cleanup from our client should the process fail. It’s not ideal, but will
>>>>> at least begin to manage our users’ quota. I’ll continue to watch the two
>>>>> tickets. Thank you.
>>>>>
>>>>>
>>>>>
>>>>> *// *ah
>>>>>
>>>>>
>>>>>
>>>>> *From:* Yang Wang <danrtsey...@gmail.com>
>>>>> *Sent:* Monday, March 16, 2020 9:37 PM
>>>>> *To:* Hailu, Andreas [Engineering] <andreas.ha...@ny.email.gs.com>
>>>>> *Cc:* tison <wander4...@gmail.com>; user@flink.apache.org
>>>>> *Subject:* Re: Flink Conf "yarn.flink-dist-jar" Question
>>>>>
>>>>>
>>>>>
>>>>> Hi Hailu,
>>>>>
>>>>>
>>>>>
>>>>> Sorry for the late response. If the Flink cluster(e.g. Yarn
>>>>> application) is stopped directly
>>>>>
>>>>> by `yarn application -kill`, then the staging directory will be left
>>>>> behind. Since the jobmanager
>>>>>
>>>>> do not have any change to clean up the staging directly. Also it may
>>>>> happen when the
>>>>>
>>>>> jobmanager crashed and reached the attempts limit of Yarn.
>>>>>
>>>>>
>>>>>
>>>>> For FLINK-13938, yes, it is trying to use the Yarn public cache to
>>>>> accelerate the container
>>>>>
>>>>> launch.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Best,
>>>>>
>>>>> Yang
>>>>>
>>>>>
>>>>>
>>>>> Hailu, Andreas <andreas.ha...@gs.com> 于2020年3月10日周二 上午4:38写道:
>>>>>
>>>>> Also may I ask what causes these application ID directories to be left
>>>>> behind? Is it a job failure, or can they persist even if the application
>>>>> succeeds? I’d like to know so that I can implement my own cleanup in the
>>>>> interim to prevent exceeding user disk space quotas.
>>>>>
>>>>>
>>>>>
>>>>> *// *ah
>>>>>
>>>>>
>>>>>
>>>>> *From:* Hailu, Andreas [Engineering]
>>>>> *Sent:* Monday, March 9, 2020 1:20 PM
>>>>> *To:* 'Yang Wang' <danrtsey...@gmail.com>
>>>>> *Cc:* tison <wander4...@gmail.com>; user@flink.apache.org
>>>>> *Subject:* RE: Flink Conf "yarn.flink-dist-jar" Question
>>>>>
>>>>>
>>>>>
>>>>> Hi Yang,
>>>>>
>>>>>
>>>>>
>>>>> Yes, a combination of these two would be very helpful for us. We have
>>>>> a single shaded binary which we use to run all of the jobs on our YARN
>>>>> cluster. If we could designate a single location in HDFS for that as well,
>>>>> we could also greatly benefit from FLINK-13938.
>>>>>
>>>>>
>>>>>
>>>>> It sounds like a general public cache solution is what’s being called
>>>>> for?
>>>>>
>>>>>
>>>>>
>>>>> *// *ah
>>>>>
>>>>>
>>>>>
>>>>> *From:* Yang Wang <danrtsey...@gmail.com>
>>>>> *Sent:* Sunday, March 8, 2020 10:52 PM
>>>>> *To:* Hailu, Andreas [Engineering] <andreas.ha...@ny.email.gs.com>
>>>>> *Cc:* tison <wander4...@gmail.com>; user@flink.apache.org
>>>>> *Subject:* Re: Flink Conf "yarn.flink-dist-jar" Question
>>>>>
>>>>>
>>>>>
>>>>> Hi Hailu, tison,
>>>>>
>>>>>
>>>>>
>>>>> I created a very similar ticket before to accelerate Flink submission
>>>>> on Yarn[1]. However,
>>>>>
>>>>> we do not get a consensus in the PR. Maybe it's time to revive the
>>>>> discussion and try
>>>>>
>>>>> to find a common solution for both the two tickets[1][2].
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> [1]. https://issues.apache.org/jira/browse/FLINK-13938
>>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_FLINK-2D13938&d=DwMFaQ&c=7563p3e2zaQw0AB1wrFVgyagb2IE5rTZOYPxLxfZlX4&r=hRr4SA7BtUvKoMBP6VDhfisy2OJ1ZAzai-pcCC6TFXM&m=rlD0F8Cr4H0aPlN6O2_K13Q76RFOERSWuJANh4q6X_8&s=njA3vGYTf0g7Zsog8AiwS4bbXxblOxepBEWUV9W3E0s&e=>
>>>>>
>>>>> [2]. https://issues.apache.org/jira/browse/FLINK-14964
>>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_FLINK-2D14964&d=DwMFaQ&c=7563p3e2zaQw0AB1wrFVgyagb2IE5rTZOYPxLxfZlX4&r=hRr4SA7BtUvKoMBP6VDhfisy2OJ1ZAzai-pcCC6TFXM&m=rlD0F8Cr4H0aPlN6O2_K13Q76RFOERSWuJANh4q6X_8&s=9kT1RZkGwWh3MAbc_ZUrsEsmRRfw6VK4rlNIeNxs6GU&e=>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Best,
>>>>>
>>>>> Yang
>>>>>
>>>>>
>>>>>
>>>>> Hailu, Andreas <andreas.ha...@gs.com> 于2020年3月7日周六 上午11:21写道:
>>>>>
>>>>> Hi Tison, thanks for the reply. I’ve replied to the ticket. I’ll be
>>>>> watching it as well.
>>>>>
>>>>>
>>>>>
>>>>> *// *ah
>>>>>
>>>>>
>>>>>
>>>>> *From:* tison <wander4...@gmail.com>
>>>>> *Sent:* Friday, March 6, 2020 1:40 PM
>>>>> *To:* Hailu, Andreas [Engineering] <andreas.ha...@ny.email.gs.com>
>>>>> *Cc:* user@flink.apache.org
>>>>> *Subject:* Re: Flink Conf "yarn.flink-dist-jar" Question
>>>>>
>>>>>
>>>>>
>>>>> FLINK-13938 seems a bit different than your requirement. The one
>>>>> totally matches is FLINK-14964
>>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_FLINK-2D14964&d=DwMFaQ&c=7563p3e2zaQw0AB1wrFVgyagb2IE5rTZOYPxLxfZlX4&r=hRr4SA7BtUvKoMBP6VDhfisy2OJ1ZAzai-pcCC6TFXM&m=9sMjDI0I_9Yni5ZWqV8GScK_KBTaA65yK9kBG-LE5_4&s=X1ZoN456fuc5mNxO6fBzDboEhrI0EHL873LzOd6tnN8&e=>.
>>>>> I'll appreciate it if you can share you opinion on the JIRA ticket.
>>>>>
>>>>>
>>>>>
>>>>> Best,
>>>>>
>>>>> tison.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> tison <wander4...@gmail.com> 于2020年3月7日周六 上午2:35写道:
>>>>>
>>>>> Yes your requirement is exactly taken into consideration by the
>>>>> community. We currently have an open JIRA ticket for the specific
>>>>> feature[1] and works for loosing the constraint of flink-jar schema to
>>>>> support DFS location should happen.
>>>>>
>>>>>
>>>>>
>>>>> Best,
>>>>>
>>>>> tison.
>>>>>
>>>>>
>>>>>
>>>>> [1] https://issues.apache.org/jira/browse/FLINK-13938
>>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_FLINK-2D13938&d=DwMFaQ&c=7563p3e2zaQw0AB1wrFVgyagb2IE5rTZOYPxLxfZlX4&r=hRr4SA7BtUvKoMBP6VDhfisy2OJ1ZAzai-pcCC6TFXM&m=9sMjDI0I_9Yni5ZWqV8GScK_KBTaA65yK9kBG-LE5_4&s=ediMPoQtcPX7K-5fjXJxE2cPp5OySkzwXYfYj8mDWO0&e=>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Hailu, Andreas <andreas.ha...@gs.com> 于2020年3月7日周六 上午2:03写道:
>>>>>
>>>>> Hi,
>>>>>
>>>>>
>>>>>
>>>>> We noticed that every time an application runs, it uploads the
>>>>> flink-dist artifact to the /user/<user>/.flink HDFS directory. This causes
>>>>> a user disk space quota issue as we submit thousands of apps to our 
>>>>> cluster
>>>>> an hour. We had a similar problem with our Spark applications where it
>>>>> uploaded the Spark Assembly package for every app. Spark provides an
>>>>> argument to use a location in HDFS its for applications to leverage so 
>>>>> they
>>>>> don’t need to upload them for every run, and that was our solution (see
>>>>> “spark.yarn.jar” configuration if interested.)
>>>>>
>>>>>
>>>>>
>>>>> Looking at the Resource Orchestration Frameworks page
>>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__ci.apache.org_projects_flink_flink-2Ddocs-2Dstable_ops_config.html-23yarn-2Dflink-2Ddist-2Djar&d=DwMFaQ&c=7563p3e2zaQw0AB1wrFVgyagb2IE5rTZOYPxLxfZlX4&r=hRr4SA7BtUvKoMBP6VDhfisy2OJ1ZAzai-pcCC6TFXM&m=9sMjDI0I_9Yni5ZWqV8GScK_KBTaA65yK9kBG-LE5_4&s=3SPuvZu9nPph-qnE3TtbTngG-k3XDBLQGyk9I_tjNtI&e=>,
>>>>> I see there’s might be a similar concept through a “yarn.flink-dist-jar”
>>>>> configuration option. I wanted to place the flink-dist package we’re using
>>>>> in a location in HDFS and configure out jobs to point to it, e.g.
>>>>>
>>>>>
>>>>>
>>>>> yarn.flink-dist-jar:
>>>>> hdfs:////user/delp/.flink/flink-dist_2.11-1.9.1.jar
>>>>>
>>>>>
>>>>>
>>>>> Am I correct in that this is what I’m looking for? I gave this a try
>>>>> with some jobs today, and based on what I’m seeing in the
>>>>> launch_container.sh in our YARN application, it still looks like it’s 
>>>>> being
>>>>> uploaded:
>>>>>
>>>>>
>>>>>
>>>>> export
>>>>> _FLINK_JAR_PATH="hdfs://d279536/user/delp/.flink/application_1583031705852_117863/flink-dist_2.11-1.9.1.jar"
>>>>>
>>>>>
>>>>>
>>>>> How can I confirm? Or is this perhaps not config I’m looking for?
>>>>>
>>>>>
>>>>>
>>>>> Best,
>>>>>
>>>>> Andreas
>>>>>
>>>>>
>>>>> ------------------------------
>>>>>
>>>>>
>>>>> Your Personal Data: We may collect and process information about you
>>>>> that may be subject to data protection laws. For more information about 
>>>>> how
>>>>> we use and disclose your personal data, how we protect your information,
>>>>> our legal basis to use your information, your rights and who you can
>>>>> contact, please refer to: www.gs.com/privacy-notices
>>>>>
>>>>>
>>>>> ------------------------------
>>>>>
>>>>>
>>>>> Your Personal Data: We may collect and process information about you
>>>>> that may be subject to data protection laws. For more information about 
>>>>> how
>>>>> we use and disclose your personal data, how we protect your information,
>>>>> our legal basis to use your information, your rights and who you can
>>>>> contact, please refer to: www.gs.com/privacy-notices
>>>>>
>>>>>
>>>>> ------------------------------
>>>>>
>>>>>
>>>>> Your Personal Data: We may collect and process information about you
>>>>> that may be subject to data protection laws. For more information about 
>>>>> how
>>>>> we use and disclose your personal data, how we protect your information,
>>>>> our legal basis to use your information, your rights and who you can
>>>>> contact, please refer to: www.gs.com/privacy-notices
>>>>>
>>>>>
>>>>> ------------------------------
>>>>>
>>>>>
>>>>> Your Personal Data: We may collect and process information about you
>>>>> that may be subject to data protection laws. For more information about 
>>>>> how
>>>>> we use and disclose your personal data, how we protect your information,
>>>>> our legal basis to use your information, your rights and who you can
>>>>> contact, please refer to: www.gs.com/privacy-notices
>>>>>
>>>>>
>>>>> ------------------------------
>>>>>
>>>>> Your Personal Data: We may collect and process information about you
>>>>> that may be subject to data protection laws. For more information about 
>>>>> how
>>>>> we use and disclose your personal data, how we protect your information,
>>>>> our legal basis to use your information, your rights and who you can
>>>>> contact, please refer to: www.gs.com/privacy-notices
>>>>>
>>>>

Reply via email to