Upgraded to use jcuda8 (from the maven repo) Closes #291
Project: http://git-wip-us.apache.org/repos/asf/incubator-systemml/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-systemml/commit/be4eaaf2 Tree: http://git-wip-us.apache.org/repos/asf/incubator-systemml/tree/be4eaaf2 Diff: http://git-wip-us.apache.org/repos/asf/incubator-systemml/diff/be4eaaf2 Branch: refs/heads/gh-pages Commit: be4eaaf2a9b27d0a611cedb8b1d53e9a0a6a9296 Parents: fd96a3e Author: Nakul Jindal <naku...@gmail.com> Authored: Fri Mar 3 18:11:45 2017 -0800 Committer: Nakul Jindal <naku...@gmail.com> Committed: Fri Mar 3 18:11:46 2017 -0800 ---------------------------------------------------------------------- devdocs/gpu-backend.md | 61 +++++++++++++++++++-------------------------- 1 file changed, 26 insertions(+), 35 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/incubator-systemml/blob/be4eaaf2/devdocs/gpu-backend.md ---------------------------------------------------------------------- diff --git a/devdocs/gpu-backend.md b/devdocs/gpu-backend.md index c6f66d6..40311c7 100644 --- a/devdocs/gpu-backend.md +++ b/devdocs/gpu-backend.md @@ -19,52 +19,43 @@ limitations under the License. # Initial prototype for GPU backend -A GPU backend implements two important abstract classes: +The GPU backend implements two important abstract classes: 1. `org.apache.sysml.runtime.controlprogram.context.GPUContext` 2. `org.apache.sysml.runtime.controlprogram.context.GPUObject` -The GPUContext is responsible for GPU memory management and initialization/destruction of Cuda handles. +The `GPUContext` is responsible for GPU memory management and initialization/destruction of Cuda handles. +Currently, an active instance of the `GPUContext` class is made available globally and is used to store handles +of the allocated blocks on the GPU. A count is kept per block for the number of instructions that need it. +When the count is 0, the block may be evicted on a call to `GPUObject.evict()`. -A GPUObject (like RDDObject and BroadcastObject) is stored in CacheableData object. It gets call-backs from SystemML's bufferpool on following methods +A `GPUObject` (like RDDObject and BroadcastObject) is stored in CacheableData object. It gets call-backs from SystemML's bufferpool on following methods 1. void acquireDeviceRead() -2. void acquireDenseDeviceModify(int numElemsToAllocate) -3. void acquireHostRead() -4. void acquireHostModify() -5. void release(boolean isGPUCopyModified) +2. void acquireDeviceModifyDense() +3. void acquireDeviceModifySparse +4. void acquireHostRead() +5. void acquireHostModify() +6. void releaseInput() +7. void releaseOutput() -## JCudaContext: -The current prototype supports Nvidia's CUDA libraries using JCuda wrapper. The implementation for the above classes can be found in: -1. `org.apache.sysml.runtime.controlprogram.context.JCudaContext` -2. `org.apache.sysml.runtime.controlprogram.context.JCudaObject` +Sparse matrices on GPU are represented in `CSR` format. In the SystemML runtime, they are represented in `MCSR` or modified `CSR` format. +A conversion cost is incurred when sparse matrices are sent back and forth between host and device memory. -### Setup instructions for JCudaContext: +Concrete classes `JCudaContext` and `JCudaObject` (which extend `GPUContext` & `GPUObject` respectively) contain references to `org.jcuda.*`. -1. Follow the instructions from `https://developer.nvidia.com/cuda-downloads` and install CUDA 7.5. -2. Follow the instructions from `https://developer.nvidia.com/cudnn` and install CuDNN v4. -3. Download install JCuda binaries version 0.7.5b and JCudnn version 0.7.5. Easiest option would be to use mavenized jcuda: -```python -git clone https://github.com/MysterionRise/mavenized-jcuda.git -mvn -Djcuda.version=0.7.5b -Djcudnn.version=0.7.5 clean package -CURR_DIR=`pwd` -JCUDA_PATH=$CURR_DIR"/target/lib/" -JAR_PATH="." -for j in `ls $JCUDA_PATH/*.jar` -do - JAR_PATH=$JAR_PATH":"$j -done -export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$JCUDA_PATH -``` +The `LibMatrixCUDA` class contains methods to invoke CUDA libraries (where available) and invoke custom kernels. +Runtime classes (that extend `GPUInstruction`) redirect calls to functions in this class. +Some functions in `LibMatrixCUDA` need finer control over GPU memory management primitives. These are provided by `JCudaObject`. + +### Setup instructions: -Note for Windows users: -* CuDNN v4 is available to download: `http://developer.download.nvidia.com/compute/redist/cudnn/v4/cudnn-7.0-win-x64-v4.0-prod.zip` -* If above steps doesn't work for JCuda, copy the DLLs into C:\lib (or /lib) directory. +1. Follow the instructions from `https://developer.nvidia.com/cuda-downloads` and install CUDA 8.0. +2. Follow the instructions from `https://developer.nvidia.com/cudnn` and install CuDNN v5.1. -To use SystemML's GPU backend, +To use SystemML's GPU backend when using the jar or uber-jar 1. Add JCuda's jar into the classpath. -2. Include CUDA, CuDNN and JCuda's libraries in LD_LIBRARY_PATH (or using -Djava.library.path). -3. Use `-gpu` flag. +2. Use `-gpu` flag. For example: to use GPU backend in standalone mode: -```python -java -classpath $JAR_PATH:systemml-0.10.0-incubating-SNAPSHOT-standalone.jar org.apache.sysml.api.DMLScript -f MyDML.dml -gpu -exec singlenode ... +```bash +java -classpath $JAR_PATH:systemml-0.14.0-incubating-SNAPSHOT-standalone.jar org.apache.sysml.api.DMLScript -f MyDML.dml -gpu -exec singlenode ... ```