Re: [DISCUSS] SEP-3: Heart-beat mechanism between JobCoordinator and all running containers

2017-05-03 Thread Navina Ramesh (Apache)
Abhishek, Thanks for clarifying and updating the SEP. Cheers! Navina On Wed, May 3, 2017 at 8:20 PM, Jagadish Venkatraman wrote: > Navina, > > > >> The ContainerHeartbeatMonitor and the ContainerHeartbeatClient are both > internal > APIs and have a concrete implementations. > > More specificall

Re: [DISCUSS] SEP-3: Heart-beat mechanism between JobCoordinator and all running containers

2017-05-03 Thread Navina Ramesh
Abhishek, Thanks for clarifying and updating the SEP. Cheers! Navina On Wed, May 3, 2017 at 8:20 PM, Jagadish Venkatraman wrote: > Navina, > > > >> The ContainerHeartbeatMonitor and the ContainerHeartbeatClient are both > internal > APIs and have a concrete implementations. > > More specificall

Re: [DISCUSS] SEP-3: Heart-beat mechanism between JobCoordinator and all running containers

2017-05-03 Thread Jagadish Venkatraman
Navina, >> The ContainerHeartbeatMonitor and the ContainerHeartbeatClient are both >> internal APIs and have a concrete implementations. More specifically, both of these are purely internal implementation classes (and have nothing to do with any pluggable public API that we expose) Best, Jagad

Re: [DISCUSS] SEP-3: Heart-beat mechanism between JobCoordinator and all running containers

2017-05-03 Thread Abhishek Shivanna
Hey Navina, Thank you for reviewing the SEP. > Are you planning on exposing this monitor class as a public api? What is the significance of doing so? Sorry for the confusion of having implementation details under "public interfaces". The ContainerHeartbeatMonitor and the ContainerHeartbeatClient

Re: [DISCUSS] SEP-3: Heart-beat mechanism between JobCoordinator and all running containers

2017-05-03 Thread Navina Ramesh (Apache)
Hi Abhishek, I checked your latest proposal in SEP and it looks good to me. QQ: > A new ContainerHeartbeatMonitor class that accepts a ContainerHeartbeatClient (which has the business logic to make heartbeat checks on the JC endpoint) and a callback. Are you planning on exposing this monitor clas

Re: [DISCUSS] SEP-3: Heart-beat mechanism between JobCoordinator and all running containers

2017-05-03 Thread Abhishek Shivanna
Hey Jagadish, Thank you for taking the time to review the design. I agree with moving the heartbeat into the the LocalContainerRunner instead of fitting it into the SamzaContainer. I will update the SEP with the new design changes. Also agree with the changes to the configuration and choosing suit

Re: [DISCUSS] SEP-3: Heart-beat mechanism between JobCoordinator and all running containers

2017-04-26 Thread Jagadish Venkatraman
Hi Abhishek, Heartbeat between the AM and container has been a long awaited Samza feature. It will go a long way in ensuring our reliability! +1 for this SEP. *High level comments:* Currently, the only use-case for the heartbeat mechanism seems to be when running Samza on Yarn. IMHO, it makes se

[DISCUSS] SEP-3: Heart-beat mechanism between JobCoordinator and all running containers

2017-04-24 Thread Abhishek Shivanna
Hi Everyone, In order to fix the issue of orphaned/leaky containers seen when the YARN Node Manager crashes, I have created a SEP discussing the design for implementing a heartbeat between the containers and the job coordinator: https://cwiki.apache.org/confluence/display/SAMZA/SEP-3%3A+Heart-beat