Hi Nick,

It works that way by explicitly setting the –host. I got mislead by the “only” 
word in doc and did not try. Thanks for the help

Thanks,
Shakir
From: "Martin, Nick" <nick.mar...@ngc.com>
Date: Tuesday, June 18, 2019 at 6:31 PM
To: "PoolakkalMukkath, Shakir" <shakir_poolakkalmukk...@comcast.com>, Till 
Rohrmann <trohrm...@apache.org>, John Smith <java.dev....@gmail.com>
Cc: user <user@flink.apache.org>
Subject: RE: [EXTERNAL] Re: How to restart/recover on reboot?

Jobmanager.sh takes an optional argument for the hostname to bind to, and 
start-cluster uses it. If you leave it blank it, the script will use whatever 
is in flink-conf.yaml (localhost is the default value that ships with flink).

The dockerized version of flink runs pretty much the way you’re trying to 
operate (i.e. each node starts itself), so the entrypoint script out of that is 
probably a good source of information about how to set it up.

From: PoolakkalMukkath, Shakir [mailto:shakir_poolakkalmukk...@comcast.com]
Sent: Tuesday, June 18, 2019 2:15 PM
To: Till Rohrmann <trohrm...@apache.org>; John Smith <java.dev....@gmail.com>
Cc: user <user@flink.apache.org>
Subject: EXT :Re: [EXTERNAL] Re: How to restart/recover on reboot?

Hi Tim,John,

I do agree with the issue John mentioned and have the same problem.

We can only start a standalone HA cluster with ./start-cluster.sh script. And 
then when there are failures, we can restart those components individually by 
calling jobmanager.sh/ jobmanager.sh.  This works great

But , Like John mentioned, If we want to start the cluster initially itself by 
running the jobmanager.sh on each JobManager nodes, it is not working. It binds 
to local and not forming the HA cluster.

Thanks,
Shakir

From: Till Rohrmann <trohrm...@apache.org<mailto:trohrm...@apache.org>>
Date: Tuesday, June 18, 2019 at 4:23 PM
To: John Smith <java.dev....@gmail.com<mailto:java.dev....@gmail.com>>
Cc: user <user@flink.apache.org<mailto:user@flink.apache.org>>
Subject: [EXTERNAL] Re: How to restart/recover on reboot?

I guess it should work if you installed a systemd service which simply calls 
`jobmanager.sh start` or `taskmanager.sh start`.

Cheers,
Till

On Tue, Jun 18, 2019 at 4:29 PM John Smith 
<java.dev....@gmail.com<mailto:java.dev....@gmail.com>> wrote:
Yes, that is understood. But I don't see why we cannot call jobmanager.sh and 
taskmanager.sh to build the cluster and have them run as systemd units.

I looked at start-cluster.sh and all it does is SSH and call jobmanager.sh 
which then cascades to taskmanager.sh I just have to pin point what's missing 
to have systemd service working. In fact calling jobmanager.sh as systemd 
service actually sees the shared masters, slaves and flink-conf.yaml. But it 
binds to local host.

Maybe one way to do it would be to bootstrap the cluster with 
./start-cluster.sh and then install systemd services for jobmanager.sh and 
tsakmanager.sh

Like I said I don't want to have some process in place to remind admins they 
need to manually start a node every time they patch or a host goes down for 
what ever reason.

On Tue, 18 Jun 2019 at 04:31, Till Rohrmann 
<trohrm...@apache.org<mailto:trohrm...@apache.org>> wrote:
When a single machine fails you should rather call `taskmanager.sh 
start`/`jobmanager.sh start` to start a single process. `start-cluster.sh` will 
start multiple processes on different machines.

Cheers,
Till

On Mon, Jun 17, 2019 at 4:30 PM John Smith 
<java.dev....@gmail.com<mailto:java.dev....@gmail.com>> wrote:
Well some reasons, machine reboots/maintenance etc... Host/VM crashes and 
restarts. And same goes for the job manager. I don't want/need to have to 
document/remember some start process for sys admins/devops.

So far I have looked at ./start-cluster.sh and all it seems to do is SSH into 
all the specified nodes and starts the processes using the jobmanager and 
taskmanager scripts. I don't see anything special in any of the sh scripts.
I configured passwordless ssh through terraform and all that works great only 
when trying to do the manual start through systemd. I may have something 
missing...

On Mon, 17 Jun 2019 at 09:41, Till Rohrmann 
<trohrm...@apache.org<mailto:trohrm...@apache.org>> wrote:
Hi John,

I have not much experience wrt setting Flink up via systemd services. Why do 
you want to do it like that?

1. In standalone mode, Flink won't automatically restart TaskManagers. This 
only works on Yarn and Mesos atm.
2. In case of a lost TaskManager, you should run `taskmanager.sh start`. This 
script simply starts a new TaskManager process.
3. I guess you could use systemd to bring up a Flink TaskManager process on 
start up.

Cheers,
Till

On Fri, Jun 14, 2019 at 5:56 PM John Smith 
<java.dev....@gmail.com<mailto:java.dev....@gmail.com>> wrote:
I looked into the start-cluster.sh and I don't see anything special. So 
technically it should be as easy as installing Systemd services to run 
jobamanger.sh and taskmanager.sh respectively?

On Wed, 12 Jun 2019 at 13:02, John Smith 
<java.dev....@gmail.com<mailto:java.dev....@gmail.com>> wrote:
The installation instructions do not indicate how to create systemd services.

1- When task nodes fail, will the job leader detect this and ssh and restart 
the task node? From my testing it doesn't seem like it.
2- How do we recover a lost node? Do we simply go back to the master node and 
run start-cluster.sh and the script is smart enough to figure out what is 
missing?
3- Or do we need to create systemd services and if so on which command do we 
start the service on?

________________________________
Notice: This e-mail is intended solely for use of the individual or entity to 
which it is addressed and may contain information that is proprietary, 
privileged and/or exempt from disclosure under applicable law. If the reader is 
not the intended recipient or agent responsible for delivering the message to 
the intended recipient, you are hereby notified that any dissemination, 
distribution or copying of this communication is strictly prohibited. This 
communication may also contain data subject to U.S. export laws. If so, data 
subject to the International Traffic in Arms Regulation cannot be disseminated, 
distributed, transferred, or copied, whether incorporated or in its original 
form, to foreign nationals residing in the U.S. or abroad, absent the express 
prior approval of the U.S. Department of State. Data subject to the Export 
Administration Act may not be disseminated, distributed, transferred or copied 
contrary to U. S. Department of Commerce regulations. If you have received this 
communication in error, please notify the sender by reply e-mail and destroy 
the e-mail message and any physical copies made of the communication.
 Thank you.
*********************

Reply via email to