Sotiris Niarchos created YARN-10621:
---------------------------------------

             Summary: GPU management using OpenCL instead of vendor-specific 
solutions
                 Key: YARN-10621
                 URL: https://issues.apache.org/jira/browse/YARN-10621
             Project: Hadoop YARN
          Issue Type: Improvement
          Components: nodemanager, yarn
            Reporter: Sotiris Niarchos


As part of the [E2Data research project|https://e2data.eu/], we at the 
[Institute of Communication and Computer Systems 
(ICCS)|https://www.iccs.gr/en/?noredirect=en_US] of the National Technical 
University of Athens, Greece, Athens have been working on a modified version of 
Hadoop Yarn where the GPU devices that are available in the underlying cluster 
are discovered via a Java wrapper of the OpenCL framework API (namely 
[JOCL|https://github.com/gpu/JOCL]), instead of vendor-specific binaries.

In other words, we have shifted towards *a more uniform and high-level handling 
of GPUs as "OpenCL-enabled" devices*. This way, we manage to *decouple GPU 
discovery/management from vendor-specific technicalities*; every GPU, no matter 
the vendor, is the same for E2Data YARN (more specifically, by the 
{{NodeManager}} component), provided that the OpenCL runtime and drivers for 
the GPU(s) of interest are installed on the respective node(s) of the cluster.

This way, we *managed to use GPUs other than NVIDIA* (which are the only ones 
officially supported via the {{nvidia-smi}} binary) with minimal additional 
effort, after our initial changes.

Ultimately, our goal is to *unify every processing unit* that YARN can possible 
utilize (CPU cores, GPUs, FPGAs) *behind a common, simple, high-level 
interface; that of the OpenCL-enabled device*.

The only drawback of our approach is that vendor-specific info regarding the 
GPUs is lost (e.g. temperature). We believe, however, that the lost information 
is not necessary for YARN; everything that Hadoop needs in order to discover 
and handle GPU devices is provided by OpenCL.

This is just a proposition/a prompt for discussion for the time being. This 
modified version is a work in progress. We consider community feedback 
regarding the core concept (and the fact that it may constitute a paradigm 
shift for YARN) crucial before attaching any patch file and diving into more 
(technical) details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

Reply via email to