[ 
https://issues.apache.org/jira/browse/MESOS-9300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16642305#comment-16642305
 ] 

James Peach commented on MESOS-9300:
------------------------------------

MacOS has 
[ATTR_DIR_MOUNTSTATUS|https://developer.apple.com/library/archive/documentation/System/Conceptual/ManPages_iPhoneOS/man2/getattrlist.2.html#//apple_ref/doc/man/2/getattrlist],
 but AFAIK there's not a straight-forward equivalent on Linux.

However like we can detect this on Linux with [EXDEV rename 
trick|http://blog.schmorp.de/2016-03-03-detecting-a-mount-point.html]

> XFS isolator can mislabel project IDs on persistence volumes.
> -------------------------------------------------------------
>
>                 Key: MESOS-9300
>                 URL: https://issues.apache.org/jira/browse/MESOS-9300
>             Project: Mesos
>          Issue Type: Bug
>          Components: agent
>            Reporter: James Peach
>            Assignee: James Peach
>            Priority: Major
>
> What happens here is that we are erroneously applying the sandbox's project 
> ID to the persistent volume.
> First, the filesystem/linux isolator bind mounts the persistent volume into 
> the sandbox:
> {noformat}
> I1003 06:49:21.907644 2812466 linux.cpp:593] Mounting 
> '/srv/mesos/work/volumes/roles/pie.mobius/21cb2eb6-b3e5-46f2-944e-8f6e5db9f07f'
>  to 
> '/srv/mesos/work/slaves/909cff92-8e17-41bf-a251-9b5eb6186c35-S0/frameworks/363e6d80-8c38-46cf-815f-2fbf60a62628-0309/executors/mobius-mloop-1538549013_438156792-v2-shared-volume.pod1.writer-job.0.e93hs3uips2i9_1/runs/9e5770a7-9f78-46dc-9264-3e80be0e40cc/shared'
>  for persistent volume disk(allocated: pie.mobius)(reservations: 
> [(DYNAMIC,pie.mobius,jarvis-principal,\{podInstance: e93hs3uips2i9, pod: 
> pod1, service: 
> mobius-mloop-1538549013_438156792-v2-shared-volume})])[21cb2eb6-b3e5-46f2-944e-8f6e5db9f07f:shared]<SHARED>:1
>  of container 9e5770a7-9f78-46dc-9264-3e80be0e40cc
> {noformat}
> Next, the `disk/xfs` isolator assigns a project ID to the sandbox:
> {noformat}
> I1003 06:49:21.920197 2812452 disk.cpp:402] Assigned project 6806 to 
> '/srv/mesos/work/slaves/909cff92-8e17-41bf-a251-9b5eb6186c35-S0/frameworks/363e6d80-8c38-46cf-815f-2fbf60a62628-0309/executors/mobius-mloop-1538549013_438156792-v2-shared-volume.pod1.writer-job.0.e93hs3uips2i9_1/runs/9e5770a7-9f78-46dc-9264-3e80be0e40cc'
> {noformat}
> Note, that when this happens, the isolator recursively applies the project ID 
> to the contents of the sandbox. It doesn't follow symlinks or cross devices 
> when it does this, but on Linux, a bind mount would not trigger either of 
> these conditions.
> Finally, the `disk/xfs` isolator tries to assign a project ID to the 
> persistent volume as it is used by the task:
> {noformat}
> F1003 06:49:21.920577 2812452 disk.cpp:532] Check failed: 
> scheduledProjects.contains(projectId.get()) untracked project ID 6806 for 
> volume ID 21cb2eb6-b3e5-46f2-944e-8f6e5db9f07f on 
> /srv/mesos/work/volumes/roles/pie.mobius/21cb2eb6-b3e5-46f2-944e-8f6e5db9f07f
> {noformat}
> This check fails, because if the persistent volume has a project ID, we 
> expect that is had already be scheduled for reclaimation. However, it's 
> project ID is the one we assigned to the sandbox. We don't scheduled the 
> ssandbox for reclaimation until cleanup, so (fortunately) the invariant check 
> triggers.
> So, apart from triggering the CHECK, the root cause of this is that we are 
> altering the project ID of the persistent volume, which permanently 
> misattributes the corresponding quote.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to