Repository: mesos Updated Branches: refs/heads/master 207b3c0fa -> f6089bdf8
Added documentation for Nvidia GPU support. Review: https://reviews.apache.org/r/46220/ Project: http://git-wip-us.apache.org/repos/asf/mesos/repo Commit: http://git-wip-us.apache.org/repos/asf/mesos/commit/f6089bdf Tree: http://git-wip-us.apache.org/repos/asf/mesos/tree/f6089bdf Diff: http://git-wip-us.apache.org/repos/asf/mesos/diff/f6089bdf Branch: refs/heads/master Commit: f6089bdf848c5cbbacdd10228bc1d3b28a59a594 Parents: 207b3c0 Author: Kevin Klues <klue...@gmail.com> Authored: Thu Sep 8 10:06:29 2016 +0200 Committer: Vinod Kone <vinodk...@gmail.com> Committed: Thu Sep 8 10:07:20 2016 +0200 ---------------------------------------------------------------------- docs/gpu-support.md | 365 +++++++++++++++++++++++++++++++++++++++++++++++ docs/home.md | 1 + 2 files changed, 366 insertions(+) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/mesos/blob/f6089bdf/docs/gpu-support.md ---------------------------------------------------------------------- diff --git a/docs/gpu-support.md b/docs/gpu-support.md new file mode 100644 index 0000000..3c20f66 --- /dev/null +++ b/docs/gpu-support.md @@ -0,0 +1,365 @@ +--- +title: Apache Mesos - Nvidia GPU Support +layout: documentation +--- + +# Nvidia GPU Support + +Mesos 1.0.0 added first-class support for Nvidia GPUs. + +## Overview +Getting up and running with GPU support in Mesos is fairly +straightforward once you know the steps necessary to make it work as +expected. On one side, this includes setting the necessary agent flags +to enumerate GPUs and advertise them to the Mesos master. On the other +side, this includes setting the proper framework capabilities so that +the Mesos master will actually include GPUs in the resource offers it +sends to a framework. So long as all of these constraints are met, +accepting offers that contain GPUs and launching tasks that consume +them should be just as straightforward as launching a traditional task +that only consumes CPUs, memory, and disk. + +As such, Mesos exposes GPUs as a simple `SCALAR` resource in the same +way it always has for CPUs, memory, and disk. That is, a resource +offer such as the following is now possible: + + cpus:8; mem:1024; disk:65536; gpus:4; + +However, unlike CPUs, memory, and disk, *only* whole numbers of GPUs +can be selected. If a fractional amount is selected, launching the +task will result in a `TASK_ERROR`. + +At the time of this writing, Nvidia GPU support is only available for +tasks launched through the Mesos containerizer (i.e. no support exists +for launching GPU capable tasks through the Docker containerizer). +That said, the Mesos containerizer now supports running docker +containers natively, so this limitation should not affect the vast +majority of users. + +Moreover, we mimic the support provided by [nvidia-docker]( +https://github.com/NVIDIA/nvidia-docker/wiki/NVIDIA-driver) to +automatically mount the proper Nvidia drivers and tools directly into +your docker container. This means you can easily test your GPU enabled +docker containers locally and deploy them to Mesos with the assurance +that they will work without modification. + +In the following sections we walk through all of the flags and +framework capabilities necessary to enable Nvidia GPU support in +Mesos. We then show an example of setting up and running an example +test cluster that launches tasks both with and without docker +containers. Finally, we conclude with a step-by-step guide of how to +install any necessary nvidia GPU drivers on your machine. + +## Agent Flags +The following isolation flags are required to enable Nvidia GPU +support on an agent. + + --isolation="cgroups/devices,gpu/nvidia" + +The `cgroups/devices` flag tells the agent to restrict access to a +specific set of devices for each task that it launches (i.e. a subset +of all devices listed in `/dev`). When used in conjunction with the +`gpu/nvidia` flag, the `cgroups/devices` flag allows us to grant / +revoke access to specific GPUs on a per-task basis. + +By default, all GPUs on an agent are automatically discovered and sent +to the Mesos master as part of its resource offer. However, it may +sometimes be necessary to restrict access to only a subset of the GPUs +available an agent. This is useful, for example, if you want to +exclude a specific GPU device because an unwanted Nvidia graphics card +is listed alongside a more powerful set of GPUs. When this is +required, the following additional agent flags can be used to +accomplish this: + + --nvidia_gpu_devices="<list_of_gpu_ids>" + + --resources="gpus:<num_gpus>" + +For the `--nvidia_gpu_devices` flag, you need to provide a comma +separated list of GPUs, as determined by running `nvidia-smi` on the +host where the agent is to be launched ([see +below](#external-dependencies) for instructions on what external +dependencies must be installed on these hosts to run this command). +Example output from running `nvidia-smi` on a machine with four GPUs +can be seen below: + + +------------------------------------------------------+ + | NVIDIA-SMI 352.79 Driver Version: 352.79 | + |-------------------------------+----------------------+----------------------+ + | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | + | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | + |===============================+======================+======================| + | 0 Tesla M60 Off | 0000:04:00.0 Off | 0 | + | N/A 34C P0 39W / 150W | 34MiB / 7679MiB | 0% Default | + +-------------------------------+----------------------+----------------------+ + | 1 Tesla M60 Off | 0000:05:00.0 Off | 0 | + | N/A 35C P0 39W / 150W | 34MiB / 7679MiB | 0% Default | + +-------------------------------+----------------------+----------------------+ + | 2 Tesla M60 Off | 0000:83:00.0 Off | 0 | + | N/A 38C P0 40W / 150W | 34MiB / 7679MiB | 0% Default | + +-------------------------------+----------------------+----------------------+ + | 3 Tesla M60 Off | 0000:84:00.0 Off | 0 | + | N/A 34C P0 39W / 150W | 34MiB / 7679MiB | 97% Default | + +-------------------------------+----------------------+----------------------+ + +The GPU `id` to choose can be seen in the far left of each row. Any +subset of these `ids` can be listed in the `--nvidia_gpu_devices` +flag (i.e., all of the following values of this flag are valid): + + --nvidia_gpu_devices="0" + --nvidia_gpu_devices="0,1" + --nvidia_gpu_devices="0,1,2" + --nvidia_gpu_devices="0,1,2,3" + --nvidia_gpu_devices="0,2,3" + --nvidia_gpu_devices="3,1" + etc... + +For the `--resources=gpus:<num_gpus>` flag, the value passed to +`<num_gpus>` must equal the number of GPUs listed in +`--nvidia_gpu_devices`. If these numbers do not match, launching the +agent will fail. This can sometimes be a source of confusion, so it +is important to emphasize it here for clarity. + +## Framework Capabilities +Once you launch an agent with the flags above, GPU resources will be +advertised to the mesos master along side all of the traditional +resources such as CPUs, memory, and disk. However, the master will +only forward offers that contain GPUs to frameworks that have +explicitly enabled the `GPU_RESOURCES` framework capability. + +The choice to make frameworks explicitly opt-in to this `GPU_RESOURCES` +capability was to keep legacy frameworks from accidentally consuming +non-GPU resources on GPU-capable machines (and thus blocking your GPU +jobs from running). It's not that big a deal if all of your nodes have +GPUs, but in a mixed-node environment, it can be a big problem. + +An example of setting this capability in a C++ based framework can be +seen below: + + FrameworkInfo framework; + framework.add_capabilities()->set_type( + FrameworkInfo::Capability::GPU_RESOURCES); + + GpuScheduler scheduler; + + driver = new MesosSchedulerDriver( + &scheduler, + framework, + 127.0.0.1:5050); + + driver->run(); + + +## Minimal GPU Capable Cluster +In this section we walk through two examples of launching GPU capable +clusters and running tasks on them. The first example demonstrates the +minimal setup required to run a command that consumes GPUs on a GPU +capable agent. The second example demonstrates the setup necessary to +launch a docker container that does the same. + +**Note**: Both of these examples assume you have installed the +external dependencies required for Nvidia GPU support on Mesos. Please +see [below](#external-dependencies) for more information. + +### Minimal Setup Without Support for Docker Containers +The commands below show a minimal example of bringing up a GPU capable +Mesos cluster on `localhost` and executing a task on it. The required +agent flags are set as described above, and the `mesos-execute` +command has been told to enable the `GPU_RESOURCES` framework +capability so it can receive offers containing GPU resources. + + $ mesos-master \ + --ip=127.0.0.1 \ + --work_dir=/var/lib/mesos + + $ mesos-agent \ + --master=127.0.0.1:5050 \ + --work_dir=/var/lib/mesos \ + --isolation="cgroups/devices,gpu/nvidia" + + $ mesos-execute \ + --master=127.0.0.1:5050 \ + --name=gpu-test \ + --command="nvidia-smi" \ + --framework_capabilities="GPU_RESOURCES" \ + --resources="gpus:1" + +If all goes well, you should see something like the following in the +`stdout` out of your task. + + +------------------------------------------------------+ + | NVIDIA-SMI 352.79 Driver Version: 352.79 | + |-------------------------------+----------------------+----------------------+ + | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | + | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | + |===============================+======================+======================| + | 0 Tesla M60 Off | 0000:04:00.0 Off | 0 | + | N/A 34C P0 39W / 150W | 34MiB / 7679MiB | 0% Default | + +-------------------------------+----------------------+----------------------+ + +### Minimal Setup With Support for Docker Containers +The commands below show a minimal example of bringing up a GPU capable +Mesos cluster on `localhost` and running a docker container on it. The +required agent flags are set as described above, and the +`mesos-execute` command has been told to enable the `GPU_RESOURCES` +framework capability so it can receive offers containing GPU +resources. Additionally, the required flags to enable support for +docker containers (as described [here](container-image.md)) have been +set up as well. + + $ mesos-master \ + --ip=127.0.0.1 \ + --work_dir=/var/lib/mesos + + $ mesos-agent \ + --master=127.0.0.1:5050 \ + --work_dir=/var/lib/mesos \ + --image_providers=docker \ + --executor_environment_variables="{}" \ + --isolation="docker/runtime,filesystem/linux,cgroups/devices,gpu/nvidia" + + $ mesos-execute \ + --master=127.0.0.1:5050 \ + --name=gpu-test \ + --docker_image=nvidia/cuda \ + --command="nvidia-smi" \ + --framework_capabilities="GPU_RESOURCES" \ + --resources="gpus:1" + +If all goes well, you should see something like the following in the +`stdout` out of your task. + + +------------------------------------------------------+ + | NVIDIA-SMI 352.79 Driver Version: 352.79 | + |-------------------------------+----------------------+----------------------+ + | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | + | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | + |===============================+======================+======================| + | 0 Tesla M60 Off | 0000:04:00.0 Off | 0 | + | N/A 34C P0 39W / 150W | 34MiB / 7679MiB | 0% Default | + +-------------------------------+----------------------+----------------------+ + +<a name="external-dependencies"></a> +## External Dependencies + +Any host running a Mesos agent with Nvidia GPU support **MUST** have a +valid Nvidia kernel driver installed. It is also *highly* recommended to +install the corresponding user-level libraries and tools available as +part of the Nvidia CUDA toolkit. Many jobs that use Nvidia GPUs rely +on CUDA and not including it will severely limit the type of +GPU-aware jobs you can run on Mesos. + +### Installing the Required Tools + +The Nvidia kernel driver can be downloaded at the link below. Make +sure to choose the proper model of GPU, operating system, and CUDA +toolkit you plan to install on your host: + + http://www.nvidia.com/Download/index.aspx + +Unfortunately, most Linux distributions come preinstalled with an open +source video driver called `Nouveau`. This driver conflicts with the +Nvidia driver we are trying to install. The following guides may prove +useful to help guide you through the process of uninstalling `Nouveau` +before installing the Nvidia driver on `CentOS` or `Ubuntu`. + + http://www.dedoimedo.com/computers/centos-7-nvidia.html + http://www.allaboutlinux.eu/remove-nouveau-and-install-nvidia-driver-in-ubuntu-15-04/ + +After installing the Nvidia kernel driver, you can follow the +instructions in the link below to install the Nvidia CUDA toolkit: + + http://docs.nvidia.com/cuda/cuda-getting-started-guide-for-linux/ + +In addition to the steps listed in the link above, it is *highly* +recommended to add CUDA's `lib` directory into your `ldcache` so that +tasks launched by Mesos will know where these libraries exist and link +with them properly. + + sudo bash -c "cat > /etc/ld.so.conf.d/cuda-lib64.conf << EOF + /usr/local/cuda/lib64 + EOF" + + sudo ldconfig + +If you choose **not** to add CUDAs `lib` directory to your `ldcache`, +you **MUST** add it to every task's `LD_LIBRARY_PATH` that requires +it. + +**Note:** This is *not* the recommended method. You have been warned. + +### Verifying the Installation + +Once the kernel driver has been installed, you can make sure +everything is working by trying to run the bundled `nvidia-smi` tool. + + nvidia-smi + +You should see output similar to the following: + + Thu Apr 14 11:58:17 2016 + +------------------------------------------------------+ + | NVIDIA-SMI 352.79 Driver Version: 352.79 | + |-------------------------------+----------------------+----------------------+ + | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | + | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | + |===============================+======================+======================| + | 0 Tesla M60 Off | 0000:04:00.0 Off | 0 | + | N/A 34C P0 39W / 150W | 34MiB / 7679MiB | 0% Default | + +-------------------------------+----------------------+----------------------+ + | 1 Tesla M60 Off | 0000:05:00.0 Off | 0 | + | N/A 35C P0 39W / 150W | 34MiB / 7679MiB | 0% Default | + +-------------------------------+----------------------+----------------------+ + | 2 Tesla M60 Off | 0000:83:00.0 Off | 0 | + | N/A 38C P0 38W / 150W | 34MiB / 7679MiB | 0% Default | + +-------------------------------+----------------------+----------------------+ + | 3 Tesla M60 Off | 0000:84:00.0 Off | 0 | + | N/A 34C P0 38W / 150W | 34MiB / 7679MiB | 99% Default | + +-------------------------------+----------------------+----------------------+ + + +-----------------------------------------------------------------------------+ + | Processes: GPU Memory | + | GPU PID Type Process name Usage | + |=============================================================================| + | No running processes found | + +-----------------------------------------------------------------------------+ + +To verify your CUDA installation, it is recommended to go through the instructions at the link below: + + http://docs.nvidia.com/cuda/cuda-getting-started-guide-for-linux/#install-samples + +Finally, you should get a developer to run Mesos's Nvidia GPU related +unit tests on your machine to ensure that everything passes (as +described below). + +### Running Mesos Unit Tests + +At the time of this writing, the following Nvidia GPU specific unit +tests exist on Mesos: + + DockerTest.ROOT_DOCKER_NVIDIA_GPU_DeviceAllow + DockerTest.ROOT_DOCKER_NVIDIA_GPU_InspectDevices + NvidiaGpuTest.ROOT_CGROUPS_NVIDIA_GPU_VerifyDeviceAccess + NvidiaGpuTest.ROOT_INTERNET_CURL_CGROUPS_NVIDIA_GPU_NvidiaDockerImage + NvidiaGpuTest.ROOT_CGROUPS_NVIDIA_GPU_FractionalResources + NvidiaGpuTest.NVIDIA_GPU_Discovery + NvidiaGpuTest.ROOT_CGROUPS_NVIDIA_GPU_FlagValidation + NvidiaGpuTest.NVIDIA_GPU_Allocator + NvidiaGpuTest.ROOT_NVIDIA_GPU_VolumeCreation + NvidiaGpuTest.ROOT_NVIDIA_GPU_VolumeShouldInject) + +The capitalized words following the `'.'` specify test filters to +apply when running the unit tests. In our case the filters that apply +are `ROOT`, `CGROUPS`, and `NVIDIA_GPU`. This means that these tests +must be run as `root` on Linux machines with `cgroups` support that +have Nvidia GPUs installed on them. The check to verify that Nvidia +GPUs exist is to look for the existence of the Nvidia System +Management Interface (`nvidia-smi`) on the machine where the tests are +being run. This binary should already be installed if the instructions +above have been followed correctly. + +So long as these filters are satisfied, you can run the following to +execute these unit tests: + + [mesos]$ GTEST_FILTER="" make -j check + [mesos]$ sudo bin/mesos-tests.sh --gtest_filter="*NVIDIA_GPU*" http://git-wip-us.apache.org/repos/asf/mesos/blob/f6089bdf/docs/home.md ---------------------------------------------------------------------- diff --git a/docs/home.md b/docs/home.md index c8aeaef..ad59eb1 100644 --- a/docs/home.md +++ b/docs/home.md @@ -44,6 +44,7 @@ layout: documentation * [Networking](networking.md) * [Container Network Interface (CNI)](cni.md) * [Port Mapping Isolator](port-mapping-isolator.md) +* [Nvidia GPU Support](gpu-support.md) for how to run Mesos with Nvidia GPU support. * [Oversubscription](oversubscription.md) for how to configure Mesos to take advantage of unused resources to launch "best-effort" tasks. * [Persistent Volume](persistent-volume.md) for how to allow tasks to access persistent storage resources. * [Multiple Disks](multiple-disk.md) for how to to allow tasks to use multiple isolated disk resources.