Re: Single-Machine Topology-aware Communication

2018-06-25 Thread Carl Yang
I added a few more figures showing how I got the MXNET_KVSTORE_GPUARRAY_BOUND value [Figures 7(b) and 7(c)]. I performed a microbenchmark measuring runtime in seconds vs. message size sent using MXNet's KVStore. Figure 7(b) shows the results of a crossover point around 1M. Beyond this point, multi-

Re: Single-Machine Topology-aware Communication

2018-06-25 Thread Pedro Larroy
Nice design document. From where does it come the default value of MXNET_KVSTORE_GPUARRAY_BOUND of 10M? Do you generate a tree for each GPU? Pedro. On Mon, Jun 18, 2018 at 2:30 PM Carl Yang wrote: > Hi, > > Currently, we have two methods for single-machine communication: > parameter server an

Single-Machine Topology-aware Communication

2018-06-18 Thread Carl Yang
Hi, Currently, we have two methods for single-machine communication: parameter server and NCCL ring reduction. Both of these methods have some downsides. Parameter server does not differentiate between NVLink connections and PCI-E, so it ends up using the higher latency and slower PCI-E connection

Single-Machine Topology-aware Communication

2018-06-18 Thread Carl Yang
Hi, Currently, we have two methods for single-machine communication: parameter server and NCCL ring reduction. Both of these methods have some downsides. Parameter server does not differentiate between NVLink connections and PCI-E, so it ends up using the higher latency and slower PCI-E connection