Hi all,
I am working on creating an integration between Apache Flink (http://flink.apache.org) and mesos which would be similar to the way the current hadoop-mesos integration works using the java mesos client. My current idea is that the scheduler will also run a JobManager process (similar to the jobTracker) which will start off a bunch of taskManager (similar to the TaskTracker) tasks using a custom executor. I want to get some feedback and information of the following questions I have: 0. How do i go about the issue of HA at the scheduler level? I was thinking of using zookeeper based leader election by directly maintaining a zookeeper connection myself. Is there a better way to do this (something which does not require me to use a self managed zookeeper connection)? 1. How do i deal with restarts and reconciling the tasks? In case the scheduler restarts (currently maintains an in-memory map of currently running tasks), How do I go about rediscovering tasks and reconciling state? I was thinking of using DiscoverInfo but I can't find any reference to figure out how to "query" mesos for tasks matching the service discovery information. - Any suggestions on how to do this. 3. How does one go about testing frameworks? Any suggestions / pointers. My work in progress version is at https://github.com/ankurcha/flink/tree/flink-mesos/flink-mesos Any help would be much appreciated. Thanks! Ankur
signature.asc
Description: Message signed with OpenPGP using GPGMail