No problem. Glad you figured out. 

@vinodkone

> On Jan 23, 2017, at 8:38 AM, Vova Shelgunov <vvs...@gmail.com> wrote:
> 
> Yes, it works. Sorry for troubling, the first time when I looked at the logs 
> I did not notice that failover_timeout is zero.
> 
> 2017-01-23 19:27 GMT+03:00 Vova Shelgunov <vvs...@gmail.com>:
>> Logs from mesos master:
>> 
>> 0123 15:53:44.523613     7 http.cpp:391] HTTP POST for 
>> /master/api/v1/scheduler from 172.18.0.1:58864 with User-Agent='AHC/2.0'
>> I0123 15:53:44.524159     7 master.cpp:4827] Processing ACKNOWLEDGE call 
>> ac9a6e5e-67b3-490a-930f-0024eab734b4 for task 10336 of framework 
>> 3edce0a6-2a9e-448f-a5c2-666e2c2c3086-0005 (Test HTTP Framework) on agent 
>> 16c100c1-13fe-47b8-a2a0-aed9bafbbf8c-S0
>> I0123 15:53:44.524849     7 master.cpp:7744] Removing task 10336 with 
>> resources cpus(*):0.1; mem(*):32 of framework 
>> 3edce0a6-2a9e-448f-a5c2-666e2c2c3086-0005 on agent 
>> 16c100c1-13fe-47b8-a2a0-aed9bafbbf8c-S0 at slave(1)@172.18.0.3:5051 
>> (mesos-slave)
>> I0123 15:53:44.529033     7 master.cpp:1297] Framework 
>> 3edce0a6-2a9e-448f-a5c2-666e2c2c3086-0005 (Test HTTP Framework) disconnected
>> I0123 15:53:44.529636     7 master.cpp:2902] Disconnecting framework 
>> 3edce0a6-2a9e-448f-a5c2-666e2c2c3086-0005 (Test HTTP Framework)
>> I0123 15:53:44.529974     7 master.cpp:2926] Deactivating framework 
>> 3edce0a6-2a9e-448f-a5c2-666e2c2c3086-0005 (Test HTTP Framework)
>> I0123 15:53:44.530299     7 master.cpp:1310] Giving framework 
>> 3edce0a6-2a9e-448f-a5c2-666e2c2c3086-0005 (Test HTTP Framework) 0ns to 
>> failover
>> I0123 15:53:44.530594     7 hierarchical.cpp:386] Deactivated framework 
>> 3edce0a6-2a9e-448f-a5c2-666e2c2c3086-0005
>> I0123 15:53:44.531962     7 master.cpp:6369] Framework failover timeout, 
>> removing framework 3edce0a6-2a9e-448f-a5c2-666e2c2c3086-0005 (Test HTif TP 
>> Framework)
>> I0123 15:53:44.534992     7 master.cpp:7103] Removing framework 
>> 3edce0a6-2a9e-448f-a5c2-666e2c2c3086-0005 (Test HTTP Framework)
>> 
>> It seems failover timeout is set to zero for the framework.
>> 
>> It can be my coding error if framework looses its connection to the master 
>> multiple times (I see that I do not pass failover_timeout value during 
>> reconnection).
>> I will try to observe if it solves my issue.
>> 
>> Thanks
>> 
>> 2017-01-23 19:05 GMT+03:00 Vova Shelgunov <vvs...@gmail.com>:
>>> Hi,
>>> 
>>> I faced a very strange situation with my framework that talks to mesos 
>>> master via Scheduler HTTP API:
>>> 
>>> Sometimes my framework stops to receive the heartbeats and task updates 
>>> from a master.
>>> I read the documentation of mesos 
>>> (http://mesos.apache.org/documentation/latest/scheduler-http-api/), Network 
>>> partitions section and I see that if a framework does not receive the 
>>> heartbeats within some time it should reconnect to the master.
>>> 
>>> I have written a heartbeat monitor that checks if there were not heartbeats 
>>> last n seconds, then reconnect, but after the reconnection, I all the time 
>>> receive an ERROR from the mesos master that my framework has been removed.
>>> 
>>> Why is it happening?
>>> 
>>> Regards,
>>> Uladzimir
>> 
> 

Reply via email to