preethamam commented on issue #12982: URL: https://github.com/apache/incubator-mxnet/issues/12982#issuecomment-1227472233
> [This issue in PyTorch](https://github.com/pytorch/pytorch/issues/1637) may be relevant. Specifically, [this comment](https://github.com/pytorch/pytorch/issues/1637#issuecomment-338268158) about either disabling IOMMU or changing to software IOMMU. I've had issues with the threadripper requiring `iommu=soft`; I'd recommend giving that a shot. I do have the same issue. I have 4 Nvidia RTX 2080Ti GPUs and whenever I use `nn.DataParallel` the Linux and Windows 11 systems crash after a few seconds when the data is pushed to the GPUs. In addition, I tried `NCCL_P2P_DISABLE=1` but none of the solutions helped me. Including the `iommu=soft` or `iommu=off'. I went through the comments and their suggestions none worked for me. I have AMD Threadripper 2950X RAM: 128 GB 4 Nvidia RTX 2080Ti GPUs Any help is much appreciated! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@mxnet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@mxnet.apache.org For additional commands, e-mail: issues-h...@mxnet.apache.org