Hi Til,
Sorry to resurface an ancient question, but is there a working example anywhere
of setting a custom restart strategy?
Asking because I’ve been wandering through the Flink 1.9 code base for a while,
and the restart strategy implementation is…pretty tangled.
From what I’ve been able to figure out, you have to provide a factory class,
something like this:
Configuration config = new Configuration();
config.setString(ConfigConstants.RESTART_STRATEGY,
MyRestartStrategyFactory.class.getCanonicalName());
StreamExecutionEnvironment env =
StreamExecutionEnvironment.createLocalEnvironment(4, config);
That factory class should extend RestartStrategyFactory, but it also needs to
implement a static method that looks like:
public static MyRestartStrategyFactory createFactory(Configuration config) {
return new MyRestartStrategyFactory();
}
I wasn’t able to find any documentation that mentioned this particular method
being a requirement.
And also the documentation at
https://ci.apache.org/projects/flink/flink-docs-stable/ops/config.html#fault-tolerance
<https://ci.apache.org/projects/flink/flink-docs-stable/ops/config.html#fault-tolerance>
doesn’t mention you can set a custom class name for the restart-strategy.
Thanks,
— Ken
> On Nov 22, 2018, at 8:18 AM, Till Rohrmann <[email protected]> wrote:
>
> Hi Kasif,
>
> I think in this situation it is best if you defined your own custom
> RestartStrategy by specifying a class which has a `RestartStrategyFactory
> createFactory(Configuration configuration)` method as `restart-strategy:
> MyRestartStrategyFactoryFactory` in `flink-conf.yaml`.
>
> Cheers,
> Till
>
> On Thu, Nov 22, 2018 at 7:18 AM Ali, Kasif <[email protected]
> <mailto:[email protected]>> wrote:
> Hello,
>
>
>
> Looking at existing restart strategies they are kind of generic. We have a
> requirement to restart the job only in case of specific exception/issues.
>
> What would be the best way to have a re start strategy which is based on few
> rules like looking at particular type of exception or some extra condition
> checks which are application specific.?
>
>
>
> Just a background on one specific issue which invoked this requirement is
> slots not getting released when the job finishes. In our applications, we
> keep track of jobs submitted with the amount of parallelism allotted to it.
> Once the job finishes we assume that the slots are free and try to submit
> next set of jobs which at times fail with error “not enough slots available”.
>
>
>
> So we think a job re start can solve this issue but we only want to re start
> only if this particular situation is encountered.
>
>
>
> Please let us know If there are better ways to solve this problem other than
> re start strategy.
>
>
>
> Thanks,
>
> Kasif
>
>
>
>
>
> Your Personal Data: We may collect and process information about you that may
> be subject to data protection laws. For more information about how we use and
> disclose your personal data, how we protect your information, our legal basis
> to use your information, your rights and who you can contact, please refer
> to: www.gs.com/privacy-notices <http://www.gs.com/privacy-notices>
--------------------------
Ken Krugler
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr