Thanks Robert, But, no, the rest.bind-port is not set to 35485 in the configuration. Other jobs use different ports, so it is getting set dynamically.
#============================================================================== # Rest & web frontend #============================================================================== # The port to which the REST client connects to. If rest.bind-port has # not been specified, then the server will bind to this port as well. # #rest.port: 8081 # The address to which the REST client will connect to # #rest.address: 0.0.0.0 # Port range for the REST and web server to bind to. # #rest.bind-port: 8080-8090 # The address that the REST & web server binds to # #rest.bind-address: 0.0.0.0 # Flag to specify whether job submission is enabled from the web-based # runtime monitor. Uncomment to disable. #web.submit.enable: false On Wed, Sep 22, 2021 at 11:46 AM Curt Buechter <tricksho...@gmail.com> wrote: > Hi, > I'm getting an error that happens randomly when starting a flink > application. > > For context, this is running in YARN on AWS. This application is one that > converts from the Table API to the Stream API, so two flink > applications/jobmanagers are trying to start up. I think what happens is > that the rest api port is chosen, and is the same for both of the flink > apps. If YARN chooses two different instances for the two task managers, > they each work fine and start their rest api on the same port on their own > respective machine. But, if YARN chooses the same instance for both job > managers, they both try to start up the rest api on the same port on the > same machine, and I get the error. > > Here is the error: > > 2021-09-22 15:47:27,724 ERROR > org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Could not > start cluster entrypoint YarnJobClusterEntrypoint. > org.apache.flink.runtime.entrypoint.ClusterEntrypointException: Failed to > initialize the cluster entrypoint YarnJobClusterEntrypoint. > at > org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:212) > ~[flink-dist_2.12-1.13.2.jar:1.13.2] > at > org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runClusterEntrypoint(ClusterEntrypoint.java:600) > [flink-dist_2.12-1.13.2.jar:1.13.2] > at > org.apache.flink.yarn.entrypoint.YarnJobClusterEntrypoint.main(YarnJobClusterEntrypoint.java:99) > [flink-dist_2.12-1.13.2.jar:1.13.2] > Caused by: org.apache.flink.util.FlinkException: Could not create the > DispatcherResourceManagerComponent. > at > org.apache.flink.runtime.entrypoint.component.DefaultDispatcherResourceManagerComponentFactory.create(DefaultDispatcherResourceManagerComponentFactory.java:275) > ~[flink-dist_2.12-1.13.2.jar:1.13.2] > at > org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runCluster(ClusterEntrypoint.java:250) > ~[flink-dist_2.12-1.13.2.jar:1.13.2] > at > org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$startCluster$1(ClusterEntrypoint.java:189) > ~[flink-dist_2.12-1.13.2.jar:1.13.2] > at java.security.AccessController.doPrivileged(Native Method) > ~[?:1.8.0_282] > at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_282] > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > ~[hadoop-common-3.2.1-amzn-3.jar:?] > at > org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) > ~[flink-dist_2.12-1.13.2.jar:1.13.2] > at > org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:186) > ~[flink-dist_2.12-1.13.2.jar:1.13.2] > ... 2 more > Caused by: java.net.BindException: Could not start rest endpoint on any port > in port range 35485 > at > org.apache.flink.runtime.rest.RestServerEndpoint.start(RestServerEndpoint.java:234) > ~[flink-dist_2.12-1.13.2.jar:1.13.2] > at > org.apache.flink.runtime.entrypoint.component.DefaultDispatcherResourceManagerComponentFactory.create(DefaultDispatcherResourceManagerComponentFactory.java:172) > ~[flink-dist_2.12-1.13.2.jar:1.13.2] > at > org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runCluster(ClusterEntrypoint.java:250) > ~[flink-dist_2.12-1.13.2.jar:1.13.2] > at > org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$startCluster$1(ClusterEntrypoint.java:189) > ~[flink-dist_2.12-1.13.2.jar:1.13.2] > at java.security.AccessController.doPrivileged(Native Method) > ~[?:1.8.0_282] > at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_282] > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > ~[hadoop-common-3.2.1-amzn-3.jar:?] > at > org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) > ~[flink-dist_2.12-1.13.2.jar:1.13.2] > at > org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:186) > ~[flink-dist_2.12-1.13.2.jar:1.13.2] > ... 2 more > > > And, here is part of the log from the other job manager, which successfully > started its rest api on the same port, just a few seconds earlier: > > > 2021-09-22 15:47:20,690 INFO > org.apache.flink.runtime.jobmaster.MiniDispatcherRestEndpoint [] - Rest > endpoint listening at ip-10-1-2-137.ec2.internal:35485 > 2021-09-22 15:47:20,691 INFO > org.apache.flink.runtime.jobmaster.MiniDispatcherRestEndpoint [] - > http://ip-10-1-2-137.ec2.internal:35485 was granted leadership with > leaderSessionID=00000000-0000-0000-0000-000000000000 > 2021-09-22 15:47:20,692 INFO > org.apache.flink.runtime.jobmaster.MiniDispatcherRestEndpoint [] - Web > frontend listening at http://ip-10-1-2-137.ec2.internal:35485. > > > > Do you know of any configuration that would assist with this? I thought about > rest.bind-port, but the rest port already seems to be chosen dynamically. My > config file has that setting commented out. > > > Thanks > > >