Hi, one of the major selling points of HDFS is (was?) that it is possible to schedule a Hadoop job close to where the data that it operates on is. I am not using HDFS, but I was wondering if/how Mesos supports an approach to schedule a job to a machine that has a certain file/dataset already locally as opposed to scheduling it to a machine that would have to access it via the network or download to the local disk first.
I was wondering if Mesos attributes could be used: I could have an attribute `datasets` of type `set` and then node A could have {dataset1, dataset17, dataset3} and node B could have {dataset17, dataset5} and during scheduling I could decide based on this attribute where to run a task. However, I was wondering if there are dynamic changes of such attributes possible. Imagine that node A deletes dataset17 from the local cache and downloads dataset5 instead, then I would like to update the `datasets` attribute dynamically, but without affecting the jobs that are running on node A. Is such a thing possible? Is there an approach other than attributes to describe the data that resides on a node in order to achieve data locality? Thanks Tobias