Re: Flink restart strategy on specific exception

Ken Krugler Tue, 12 May 2020 16:09:22 -0700

Hi Til,

Sorry to resurface an ancient question, but is there a working example anywhere 
of setting a custom restart strategy?


Asking because I’ve been wandering through the Flink 1.9 code base for a while, 
and the restart strategy implementation is…pretty tangled.

From what I’ve been able to figure out, you have to provide a factory class, 
something like this:

        Configuration config = new Configuration();
        config.setString(ConfigConstants.RESTART_STRATEGY, 
MyRestartStrategyFactory.class.getCanonicalName());
        StreamExecutionEnvironment env = 
StreamExecutionEnvironment.createLocalEnvironment(4, config);

That factory class should extend RestartStrategyFactory, but it also needs to 
implement a static method that looks like:

    public static MyRestartStrategyFactory createFactory(Configuration config) {
        return new MyRestartStrategyFactory();
    }

I wasn’t able to find any documentation that mentioned this particular method 
being a requirement.

And also the documentation at 
https://ci.apache.org/projects/flink/flink-docs-stable/ops/config.html#fault-tolerance
 
<https://ci.apache.org/projects/flink/flink-docs-stable/ops/config.html#fault-tolerance>
 doesn’t mention you can set a custom class name for the restart-strategy.

Thanks,

— Ken


> On Nov 22, 2018, at 8:18 AM, Till Rohrmann <[email protected]> wrote:
> 
> Hi Kasif,
> 
> I think in this situation it is best if you defined your own custom 
> RestartStrategy by specifying a class which has a `RestartStrategyFactory 
> createFactory(Configuration configuration)` method as `restart-strategy: 
> MyRestartStrategyFactoryFactory` in `flink-conf.yaml`.
> 
> Cheers,
> Till
> 
> On Thu, Nov 22, 2018 at 7:18 AM Ali, Kasif <[email protected] 
> <mailto:[email protected]>> wrote:
> Hello,
> 
>  
> 
> Looking at existing restart strategies they are kind of generic. We have a 
> requirement to restart the job only in case of specific exception/issues.
> 
> What would be the best way to have a re start strategy which is based on few 
> rules like looking at particular type of exception or some extra condition 
> checks which are application specific.?
> 
>  
> 
> Just a background on one specific issue which invoked this requirement is 
> slots not getting released when the job finishes. In our applications, we 
> keep track of jobs submitted with the amount of parallelism allotted to it.  
> Once the job finishes we assume that the slots are free and try to submit 
> next set of jobs which at times fail with error  “not enough slots available”.
> 
>  
> 
> So we think a job re start can solve this issue but we only want to re start 
> only if this particular situation is encountered.
> 
>  
> 
> Please let us know If there are better ways to solve this problem other than 
> re start strategy.
> 
>  
> 
> Thanks,
> 
> Kasif
> 
>  
> 
> 
> 
> Your Personal Data: We may collect and process information about you that may 
> be subject to data protection laws. For more information about how we use and 
> disclose your personal data, how we protect your information, our legal basis 
> to use your information, your rights and who you can contact, please refer 
> to: www.gs.com/privacy-notices <http://www.gs.com/privacy-notices>

--------------------------
Ken Krugler
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr

Re: Flink restart strategy on specific exception

Reply via email to