I added a few more figures showing how I got the
MXNET_KVSTORE_GPUARRAY_BOUND value [Figures 7(b) and 7(c)]. I
performed a microbenchmark measuring runtime in seconds vs. message
size sent using MXNet's KVStore. Figure 7(b) shows the results of a
crossover point around 1M. Beyond this point, multi-
Nice design document. From where does it come the default value
of MXNET_KVSTORE_GPUARRAY_BOUND of 10M?
Do you generate a tree for each GPU?
Pedro.
On Mon, Jun 18, 2018 at 2:30 PM Carl Yang wrote:
> Hi,
>
> Currently, we have two methods for single-machine communication:
> parameter server an
Hi,
Currently, we have two methods for single-machine communication:
parameter server and NCCL ring reduction. Both of these methods have
some downsides. Parameter server does not differentiate between NVLink
connections and PCI-E, so it ends up using the higher latency and
slower PCI-E connection
Hi,
Currently, we have two methods for single-machine communication:
parameter server and NCCL ring reduction. Both of these methods have
some downsides. Parameter server does not differentiate between NVLink
connections and PCI-E, so it ends up using the higher latency and
slower PCI-E connection