[GitHub] [incubator-tvm] t-vi commented on pull request #5600: [TOPI] Improve CUDA softmax scheduling

2020-06-04 Thread GitBox
t-vi commented on pull request #5600: URL: https://github.com/apache/incubator-tvm/pull/5600#issuecomment-638823562 @wpan11nv Thanks for your offer to help. I submitted the clean-up #5726 and then in #5727 I add ROCm warp reductions. One of the things I did was to avoid assuming a fixed w

[GitHub] [incubator-tvm] t-vi commented on pull request #5600: [TOPI] Improve CUDA softmax scheduling

2020-06-03 Thread GitBox
t-vi commented on pull request #5600: URL: https://github.com/apache/incubator-tvm/pull/5600#issuecomment-638622419 I'm adding shfl intrinsics to the rocm bits (using `tvm.intrin.rule.rocm.tvm_warp_shuffle /-up/-down` definitions). I'm currently seeing a funny effect where I get a `tvm_t

[GitHub] [incubator-tvm] t-vi commented on pull request #5600: [TOPI] Improve CUDA softmax scheduling

2020-06-03 Thread GitBox
t-vi commented on pull request #5600: URL: https://github.com/apache/incubator-tvm/pull/5600#issuecomment-638329923 I'll just work on a fix. This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [incubator-tvm] t-vi commented on pull request #5600: [TOPI] Improve CUDA softmax scheduling

2020-06-03 Thread GitBox
t-vi commented on pull request #5600: URL: https://github.com/apache/incubator-tvm/pull/5600#issuecomment-638275567 So ROCm uses the CUDA schedule, but warp reductions don't seem to currently work (so arguably, ROCm would want to be improved). But so before this PR, one could run resnet18

[GitHub] [incubator-tvm] t-vi commented on pull request #5600: [TOPI] Improve CUDA softmax scheduling

2020-06-03 Thread GitBox
t-vi commented on pull request #5600: URL: https://github.com/apache/incubator-tvm/pull/5600#issuecomment-638068589 This broke the ROCm backend. This is an automated message from the Apache Git Service. To respond to the mess