Hi all,

I am working on creating an integration between Apache Flink 
(http://flink.apache.org) and mesos which would be similar to the way the 
current hadoop-mesos integration works using the java mesos client.
My current idea is that the scheduler will also run a JobManager process 
(similar to the jobTracker) which will start off a bunch of taskManager 
(similar to the TaskTracker) tasks using a custom executor.

I want to get some feedback and information of the following questions I have:

0. How do i go about the issue of HA at the scheduler level?
    I was thinking of using zookeeper based leader election by directly 
maintaining a zookeeper connection myself. Is there a better way to do this 
(something which does not require me to use a self managed zookeeper 
connection)?

1. How do i deal with restarts and reconciling the tasks?
    In case the scheduler restarts (currently maintains an in-memory map of 
currently running tasks), How do I go about rediscovering tasks and reconciling 
state?
    I was thinking of using DiscoverInfo but I can't find any reference to 
figure out how to "query" mesos for tasks matching the service discovery 
information. - Any suggestions on how to do this.

3. How does one go about testing frameworks? Any suggestions / pointers.

My work in progress version is at 
https://github.com/ankurcha/flink/tree/flink-mesos/flink-mesos

Any help would be much appreciated.


Thanks!
Ankur

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

Reply via email to