Hi,

one of the major selling points of HDFS is (was?) that it is possible to
schedule a Hadoop job close to where the data that it operates on is.  I am
not using HDFS, but I was wondering if/how Mesos supports an approach to
schedule a job to a machine that has a certain file/dataset already locally
as opposed to scheduling it to a machine that would have to access it via
the network or download to the local disk first.

I was wondering if Mesos attributes could be used:  I could have an
attribute `datasets` of type `set` and then node A could have {dataset1,
dataset17, dataset3} and node B could have {dataset17, dataset5} and during
scheduling I could decide based on this attribute where to run a task.
However, I was wondering if there are dynamic changes of such attributes
possible.  Imagine that node A deletes dataset17 from the local cache and
downloads dataset5 instead, then I would like to update the `datasets`
attribute dynamically, but without affecting the jobs that are running on
node A.  Is such a thing possible?

Is there an approach other than attributes to describe the data that
resides on a node in order to achieve data locality?

Thanks
Tobias

Reply via email to