What do you think about reverting this change (FLINK-8696), because it is really hard to debug for users? A problem would be if people now rely on the second argument being the hostname.
An alternative could be to filter out `cluster` and `local` if they should appear as second argument. This could however lead to problems if a user wants to set the hostname to either `local` or `cluster` via jobmanager.sh. Cheers, Till On Wed, Sep 26, 2018 at 11:24 AM Till Rohrmann <trohrm...@apache.org> wrote: > Yes, that would be a good idea. I think it should go into the release > notes. Will add it. > > On Wed, Sep 26, 2018 at 10:24 AM Fabian Hueske <fhue...@gmail.com> wrote: > >> Should we add a warning to the release announcements? >> >> Fabian >> >> Am Mi., 26. Sep. 2018 um 10:22 Uhr schrieb Robert Metzger < >> rmetz...@apache.org>: >> >>> Hey Jamie, >>> >>> we've been facing the same issue with dA Platform, when running Flink >>> 1.6.1. >>> I assume a lot of people will be affected by this. >>> >>> >>> >>> On Tue, Sep 25, 2018 at 11:18 PM Till Rohrmann <trohrm...@apache.org> >>> wrote: >>> >>>> Hi Jamie, >>>> >>>> thanks for the update on how to fix the problem. This is very helpful >>>> for the rest of the community. >>>> >>>> The change of removing the execution mode parameter (FLINK-8696) from >>>> the start up scripts was actually released with Flink 1.5.0. That way, the >>>> host name became the 2nd parameter. By calling the start up scripts with >>>> the old syntax, the execution mode parameter was interpreted as the >>>> hostname. This host name option was, however, not properly evaluated until >>>> we fixed it with Flink 1.5.4. Therefore, the problem only surfaced now. >>>> >>>> We definitely need to treat the start up scripts as a stable API as >>>> well. So far, we don't have good tooling which ensures that we don't >>>> introduce breaking changes. In the future we need to be more careful! >>>> >>>> Cheers, >>>> Till >>>> >>>> On Tue, Sep 25, 2018 at 8:54 PM Jamie Grier <jgr...@lyft.com> wrote: >>>> >>>>> Update on this: >>>>> >>>>> The issue was the command being used to start the jobmanager: >>>>> `jobmanager.sh start-foreground cluster`. This was a command leftover in >>>>> our automation that used to be the correct way to start the JM -- however >>>>> now, in Flink 1.5.4, that second parameter, `cluster`, is being >>>>> interpreted >>>>> as the hostname for the jobmanager to bind to. >>>>> >>>>> The solution was just to remove `cluster` from that command. >>>>> >>>>> >>>>> >>>>> On Tue, Sep 25, 2018 at 10:15 AM Jamie Grier <jgr...@lyft.com> wrote: >>>>> >>>>>> Anybody else seen this and know the solution? We're dead in the >>>>>> water with Flink 1.5.4. >>>>>> >>>>>> On Sun, Sep 23, 2018 at 11:46 PM alex <ek.rei...@gmail.com> wrote: >>>>>> >>>>>>> We started to see same errors after upgrading to flink 1.6.0 from >>>>>>> 1.4.2. We >>>>>>> have one JM and 5 TM on kubernetes. JM is running on HA mode. >>>>>>> Taskmanagers >>>>>>> sometimes are loosing connection to JM and having following error >>>>>>> like you >>>>>>> have. >>>>>>> >>>>>>> *2018-09-19 12:36:40,687 INFO >>>>>>> org.apache.flink.runtime.taskexecutor.TaskExecutor - >>>>>>> Could not >>>>>>> resolve ResourceManager address >>>>>>> akka.tcp://flink@flink-jobmanager:50002/user/resourcemanager, >>>>>>> retrying in >>>>>>> 10000 ms: Ask timed out on >>>>>>> [ActorSelection[Anchor(akka.tcp://flink@flink-jobmanager:50002/), >>>>>>> Path(/user/resourcemanager)]] after [10000 ms]. Sender[null] sent >>>>>>> message of >>>>>>> type "akka.actor.Identify"..* >>>>>>> >>>>>>> When TM started to have "Could not resolve ResourceManager", it >>>>>>> cannot >>>>>>> resolve itself until I restart the TM pod. >>>>>>> >>>>>>> *Here is the content of our flink-conf.yaml:* >>>>>>> blob.server.port: 6124 >>>>>>> jobmanager.rpc.address: flink-jobmanager >>>>>>> jobmanager.rpc.port: 6123 >>>>>>> jobmanager.heap.mb: 4096 >>>>>>> jobmanager.web.history: 20 >>>>>>> jobmanager.archive.fs.dir: s3://our_path >>>>>>> taskmanager.rpc.port: 6121 >>>>>>> taskmanager.heap.mb: 16384 >>>>>>> taskmanager.numberOfTaskSlots: 10 >>>>>>> taskmanager.log.path: /opt/flink/log/output.log >>>>>>> web.log.path: /opt/flink/log/output.log >>>>>>> state.checkpoints.num-retained: 3 >>>>>>> metrics.reporters: prom >>>>>>> metrics.reporter.prom.class: >>>>>>> org.apache.flink.metrics.prometheus.PrometheusReporter >>>>>>> >>>>>>> high-availability: zookeeper >>>>>>> high-availability.jobmanager.port: 50002 >>>>>>> high-availability.zookeeper.quorum: zookeeper_instance_list >>>>>>> high-availability.zookeeper.path.root: /flink >>>>>>> high-availability.cluster-id: profileservice >>>>>>> high-availability.storageDir: s3://our_path >>>>>>> >>>>>>> Any help will be greatly appreciated! >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Sent from: >>>>>>> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ >>>>>>> >>>>>>