arcadiaphy commented on issue #14522: mx.nd.Custom conflicts with memory
management
URL:
https://github.com/apache/incubator-mxnet/issues/14522#issuecomment-478337910
@anirudh2290 Have started a PR for 1, but the really tricky one is 2.
arcadiaphy commented on issue #14522: mx.nd.Custom conflicts with memory
management
URL:
https://github.com/apache/incubator-mxnet/issues/14522#issuecomment-478198605
@larroy Yes, two different issues here.
Reproduced the gemm bug too, but I'm focusing on the exception handling
arcadiaphy commented on issue #14522: mx.nd.Custom conflicts with memory
management
URL:
https://github.com/apache/incubator-mxnet/issues/14522#issuecomment-477984656
@wkcn @anirudh2290 Like what I've mentioned before, adding on_complete
callback in CustomOperation is not a good design,
arcadiaphy commented on issue #14522: mx.nd.Custom conflicts with memory
management
URL:
https://github.com/apache/incubator-mxnet/issues/14522#issuecomment-477870929
@anirudh2290 @wkcn @YutingZhang Finally figure out the reason:
Normally, when a exception is thrown in spawn
arcadiaphy commented on issue #14522: mx.nd.Custom conflicts with memory
management
URL:
https://github.com/apache/incubator-mxnet/issues/14522#issuecomment-477095499
@wkcn I think it's not related to OOM, looks more like LOG(FATAL) gets stuck.
arcadiaphy commented on issue #14522: mx.nd.Custom conflicts with memory
management
URL:
https://github.com/apache/incubator-mxnet/issues/14522#issuecomment-477077127
```
MXNET_ENGINE_TYPE=ThreadedEngine gdb python -ex 'b
src/storage/pooled_storage_manager.h:145' -ex 'r reproduce.py'