William Montaz created YARN-11906:
-------------------------------------

             Summary: Nodemanager broken when using ReadWriteDiskValidator
                 Key: YARN-11906
                 URL: https://issues.apache.org/jira/browse/YARN-11906
             Project: Hadoop YARN
          Issue Type: Bug
          Components: nodemanager
    Affects Versions: 3.3.6
            Reporter: William Montaz


YARN-7300 introduces DiskValidator in LocalDirAllocator. In details, it 
replaces calls to DiskChecker.checkDir() by DiskValidator.checkStatus().

The problem this brings is that DiskChecker is creating directories in its 
check. This is directly reproduced by BasicDiskValidator. But 
ReadWriteDiskValidator does not create the dir, it check if dir exists prior to 
calling DiskChecker.checkDir() and fails directly if the dir is absent.

ContainerLaunch will try to create paths that do not exist (example, the 
launch_container.sh file), expecting DiskChecker to create them. Since 
YARN-7300 using ReadWriteDiskValidator will thus fail on any good disk when the 
dir do not exist. We end with such exceptions at any container launch phase
{noformat}
Could not find any valid local directory for 
nmPrivate/application_1765167412015_56041/container_e1150_1765167412015_56041_02_000001//launch_container.sh
 with requested size -1 as the max capacity in any directory is 0{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to