Whitelist seems to be the best option right now. I will try that.

 

thanks

 

Von: Jeremy Olexa [mailto:jol...@spscommerce.com] 
Gesendet: Mittwoch, 30. Dezember 2015 17:22
An: user@mesos.apache.org
Betreff: Re: make slaves not getting tasks anymore

 

Hi Mike,

 

Yes, there is another way besides the maintenance primitives that aren't fully 
complete yet (IMO). If you wish to not schedule anymore jobs, you can remove 
that host from the whitelist on the masters. You might have to engineer this 
for your setup abit, but this is what we do:

 

1) All slaves are discovered and explicitly added to the whitelist

2) On demand (by the operator), a node is REMOVED from the whitelist for some 
time, currently we add the node back after a timeout of 1 hour

3) Wait for jobs to finish on that node, or send SIGUSR1 to mesos-slave process 
to force job termination

 

Of course, there is also satellite, which does all this for you :) 
https://github.com/twosigma/satellite/

 

Hope that helps,
-Jeremy



  _____  

From: Mike Michel <mike.mic...@mmbash.de <mailto:mike.mic...@mmbash.de> >
Sent: Wednesday, December 30, 2015 5:43 AM
To: user@mesos.apache.org <mailto:user@mesos.apache.org> 
Subject: make slaves not getting tasks anymore 

 

Hi,

 

i need to update slaves from time to time and looking for a way to take them 
out of the cluster but without killing the running tasks. I need to wait until 
all tasks are done and during this time no new tasks should be started on this 
slave. My first idea was to set a constraint „status:online“ for every task i 
start and then change the attribute of the slave to „offline“, restart slave 
process while executer still runs the tasks but it seems if you change the 
attributes of a slave it can not connect to the cluster without rm -rf /tmp 
before which will kill all tasks.

 

Also the maintenance mode seems not to be an option:

 

„When maintenance is triggered by the operator, all agents on the machine are 
told to shutdown. These agents are subsequently removed from the master which 
causes tasks to be updated as TASK_LOST. Any agents from machines in 
maintenance are also prevented from registering with the master.“

 

Is there another way?

 

 

Cheers

 

Mike

Reply via email to