Hi, @tobias I think a lot of people encounter such problems. And I saw in CSI <https://docs.google.com/document/d/125YWqg_5BB5OY9a6M7LZcby5RSqBwo2PZzpVLuxYXh4/edit> (From @jieyu) design document, Mesos is adding a new component resource provider, I think it may help to resolve data locality problem.
For dynamic attributes, I think it is also doable, we could expose it via HTTP APIs just like the dynamic reservation. On Wed, Jun 28, 2017 at 8:22 AM, Tobias Pfeiffer <t...@preferred.jp> wrote: > Hi, > > one of the major selling points of HDFS is (was?) that it is possible to > schedule a Hadoop job close to where the data that it operates on is. I am > not using HDFS, but I was wondering if/how Mesos supports an approach to > schedule a job to a machine that has a certain file/dataset already locally > as opposed to scheduling it to a machine that would have to access it via > the network or download to the local disk first. > > I was wondering if Mesos attributes could be used: I could have an > attribute `datasets` of type `set` and then node A could have {dataset1, > dataset17, dataset3} and node B could have {dataset17, dataset5} and during > scheduling I could decide based on this attribute where to run a task. > However, I was wondering if there are dynamic changes of such attributes > possible. Imagine that node A deletes dataset17 from the local cache and > downloads dataset5 instead, then I would like to update the `datasets` > attribute dynamically, but without affecting the jobs that are running on > node A. Is such a thing possible? > > Is there an approach other than attributes to describe the data that > resides on a node in order to achieve data locality? > > Thanks > Tobias > > -- Best Regards, Haosdent Huang