Hi Dongwon, This should work but it could also interfere with Flink itself exiting in case of a fatal error.
Regards, Roman On Fri, Dec 6, 2019 at 2:54 AM Dongwon Kim <eastcirc...@gmail.com> wrote: > FYI, we've launched a session cluster where multiple jobs are managed by a > job manager. If that happens, all the other jobs also fail because the job > manager is shut down and all the task managers get into chaos (failing to > connect to the job manager). > > I just searched a way to prevent System.exit() calls from terminating JVMs > and found [1]. Can it be a possible solution to the problem? > > [1] > https://stackoverflow.com/questions/5549720/how-to-prevent-calls-to-system-exit-from-terminating-the-jvm > > Best, > - Dongwon > > On Fri, Dec 6, 2019 at 10:39 AM Dongwon Kim <eastcirc...@gmail.com> wrote: > >> Hi Robert and Roman, >> >> Thank you for taking a look at this. >> >> what is your main() method / client doing when it's receiving wrong >>> program parameters? Does it call System.exit(), or something like that? >>> >> >> I just found that our HTTP client is programmed to call System.exit(1). I >> should guide not to call System.exit() in Flink applications. >> >> p.s. Just out of curiosity, is there no way for the web app to intercept >> System.exit() and prevent the job manager from being shutting down? >> >> Best, >> >> - Dongwon >> >> On Fri, Dec 6, 2019 at 3:59 AM Robert Metzger <rmetz...@apache.org> >> wrote: >> >>> Hi Dongwon, >>> >>> what is your main() method / client doing when it's receiving wrong >>> program parameters? Does it call System.exit(), or something like that? >>> >>> By the way, the http address from the error message is >>> publicly available. Not sure if this is internal data or not. >>> >>> On Thu, Dec 5, 2019 at 6:32 PM Khachatryan Roman < >>> khachatryan.ro...@gmail.com> wrote: >>> >>>> Hi Dongwon, >>>> >>>> I wasn't able to reproduce your problem with Flink JobManager 1.9.1 >>>> with various kinds of errors in the job. >>>> I suggest you try it on a fresh Flink installation without any other >>>> jobs submitted. >>>> >>>> Regards, >>>> Roman >>>> >>>> >>>> On Thu, Dec 5, 2019 at 3:48 PM Dongwon Kim <eastcirc...@gmail.com> >>>> wrote: >>>> >>>>> Hi Roman, >>>>> >>>>> We're using the latest version 1.9.1 and those two lines are all I've >>>>> seen after executing the job on the web ui. >>>>> >>>>> Best, >>>>> >>>>> Dongwon >>>>> >>>>> On Thu, Dec 5, 2019 at 11:36 PM r_khachatryan < >>>>> khachatryan.ro...@gmail.com> wrote: >>>>> >>>>>> Hi Dongwon, >>>>>> >>>>>> Could you please provide Flink version you are running and the job >>>>>> manager >>>>>> logs? >>>>>> >>>>>> Regards, >>>>>> Roman >>>>>> >>>>>> >>>>>> eastcirclek wrote >>>>>> > Hi, >>>>>> > >>>>>> > I tried to run a program by uploading a jar on Flink UI. When I >>>>>> > intentionally enter a wrong parameter to my program, JobManager >>>>>> dies. >>>>>> > Below >>>>>> > is all log messages I can get from JobManager; JobManager dies as >>>>>> soon as >>>>>> > spitting the second line: >>>>>> > >>>>>> > 2019-12-05 04:47:58,623 WARN >>>>>> >> org.apache.flink.runtime.webmonitor.handlers.JarRunHandler - >>>>>> >> Configuring the job submission via query parameters is deprecated. >>>>>> Please >>>>>> >> migrate to submitting a JSON request instead. >>>>>> >> >>>>>> >> >>>>>> >> *2019-12-05 04:47:59,133 ERROR com.skt.apm.http.HTTPClient >>>>>> >> - Cannot >>>>>> >> connect: >>>>>> http://52.141.38.11:8380/api/spec/poc_asset_model_01/model/imbalance/models >>>>>> >> < >>>>>> http://52.141.38.11:8380/api/spec/poc_asset_model_01/model/imbalance/models> >>>>>> ;: >>>>>> >> com.fasterxml.jackson.databind.exc.MismatchedInputException: Cannot >>>>>> >> deserialize instance of `java.util.ArrayList` out of START_OBJECT >>>>>> token >>>>>> >> at >>>>>> >> [Source: >>>>>> >> >>>>>> (String)“{”code”:“GB0001”,“resource”:“msg.comm.unknown.error”,“details”:“NullPointerException: >>>>>> >> “}”; line: 1, column: 1]2019-12-05 04:47:59,166 INFO >>>>>> >> org.apache.flink.runtime.blob.BlobServer - >>>>>> Stopped >>>>>> >> BLOB server at 0.0.0.0:6124 <http://0.0.0.0:6124>* >>>>>> > >>>>>> > >>>>>> > The second line is obviously from my program and it shouldn't cause >>>>>> > JobManager to be shut down. Is it intended behavior? >>>>>> > >>>>>> > Best, >>>>>> > >>>>>> > Dongwon >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Sent from: >>>>>> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ >>>>>> >>>>>