[ https://issues.apache.org/jira/browse/YARN-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chen Qingcha updated YARN-7481: ------------------------------- Description: We enhance Hadoop with GPU support for better AI job scheduling. Currently, YARN-3926 also supports GPU scheduling, which treats GPU as countable resource. However, GPU placement is very important to deep learning job for better efficiency. For example, a 2-GPU job runs on gpu {0,1} could be faster than run on gpu {0, 7}, if GPU 0 and 1 are under the same PCI-E switch while 0 and 7 are not. We add the GPU support to Hadoop 2.7.2 to enable GPU locality scheduling, which support fine-grained GPU placement. A 64-bits bitmap is added to yarn Resource, which indicates both GPU usage and locality information in a node (up to 64 GPUs per node). '1' means available and '0' otherwise in the corresponding position of the bit. was: We enhance Hadoop with GPU support for better AI job scheduling. Currently, YARN-3926 also supports GPU scheduling, which treats GPU as countable resource. However, GPU placement is very important to deep learning job for better efficiency. For example, a 2-GPU job runs on gpu {0,1} could be faster than run on gpu {0, 7}, if GPU 0 and 1 are under the same PCI-E switch while 0 and 7 are not. We add the GPU support to Hadoop 2.7.2 to enable GPU locality scheduling, which support fine-grained GPU placement. A 64-bits bitmap is added to yarn Resource, which indicates both GPU usage and locality information in a node (up to 64 GPUs per node). '1' means available and '0' otherwise in the corresponding position of the bit. > Gpu locality support for Better AI scheduling > --------------------------------------------- > > Key: YARN-7481 > URL: https://issues.apache.org/jira/browse/YARN-7481 > Project: Hadoop YARN > Issue Type: New Feature > Components: api, RM, yarn > Affects Versions: 2.7.2 > Reporter: Chen Qingcha > Fix For: 2.7.2 > > Attachments: GPU locality support for Job scheduling.pdf, > hadoop-2.7.2-gpu.patch > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > We enhance Hadoop with GPU support for better AI job scheduling. > Currently, YARN-3926 also supports GPU scheduling, which treats GPU as > countable resource. > However, GPU placement is very important to deep learning job for better > efficiency. > For example, a 2-GPU job runs on gpu {0,1} could be faster than run on gpu > {0, 7}, if GPU 0 and 1 are under the same PCI-E switch while 0 and 7 are not. > We add the GPU support to Hadoop 2.7.2 to enable GPU locality scheduling, > which support fine-grained GPU placement. > A 64-bits bitmap is added to yarn Resource, which indicates both GPU usage > and locality information in a node (up to 64 GPUs per node). '1' means > available and '0' otherwise in the corresponding position of the bit. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org