Hi Hayden, I tried to reproduce the problem you described and followed the HA setup instructions of the documentation [1]. For me the instructions worked and start-cluster.sh started two JobManagers on my local machine (master contained two localhost entries).
The bash scripts tend to be a bit fragile, especially when it comes to handling spaces in variables and quotes. What kind of environment are you running on (I'm on macOS) and do you try to start the JMs on localhost or remote machines? Best, Fabian [1] https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/jobmanager_high_availability.html#configuration 2017-10-16 11:53 GMT+02:00 Marchant, Hayden <hayden.march...@citi.com>: > I am attempting to run Flink 1.3.2 in HA mode with zookeeper. > > When I run the start-cluster.sh, the job manager is not started, even > though the task manager is started. When I delved into this, I saw that > the command: > > ssh -n $FLINK_SSH_OPTS $master -- "nohup /bin/bash -l > \"${FLINK_BIN_DIR}/jobmanager.sh\" start cluster ${master} ${webuiport} &" > > is not actually running anything on the host. i.e. I do not see "Starting > jobmanager daemon on host ....." > > Only when I remove ALL quotes, do I see it working. i.e. if I run: > > ssh -n $FLINK_SSH_OPTS $master -- nohup /bin/bash -l > ${FLINK_BIN_DIR}/jobmanager.sh start cluster ${master} ${webuiport} & > > I see that it manages to run the job manager - I see " Starting jobmanager > daemon on host.....". > > Did anyone else experience a similar problem? Any elegant workarounds > without having to change source code? > > Thanks, > Hayden Marchant > >