[GitHub] [incubator-tvm] yongwww commented on pull request #6108: Fix CUDA Compute Function For `get_valid_counts` and `nms`

2020-09-14 Thread GitBox
yongwww commented on pull request #6108: URL: https://github.com/apache/incubator-tvm/pull/6108#issuecomment-692398165 @lsy643 thanks for sharing the results. What I am wondering is the latency of your change vs previous nms gpu version (even the output is not identical), and probably the

[GitHub] [incubator-tvm] yongwww commented on pull request #6108: Fix CUDA Compute Function For `get_valid_counts` and `nms`

2020-09-04 Thread GitBox
yongwww commented on pull request #6108: URL: https://github.com/apache/incubator-tvm/pull/6108#issuecomment-687288778 @lsy643 you are right, the auxiliary op get_valid_count and strided_slice are utilized to help handle TensorFlow dynamic NonMaximumSuppression. As a todo task, the cpu an

[GitHub] [incubator-tvm] yongwww commented on pull request #6108: Fix CUDA Compute Function For `get_valid_counts` and `nms`

2020-08-23 Thread GitBox
yongwww commented on pull request #6108: URL: https://github.com/apache/incubator-tvm/pull/6108#issuecomment-678908663 @lsy643 the `rearrange_indices_out` part you updated looks good to me. Currently I am concerned about the thread related change, since the change might cause some performa

[GitHub] [incubator-tvm] yongwww commented on pull request #6108: Fix CUDA Compute Function For `get_valid_counts` and `nms`

2020-07-25 Thread GitBox
yongwww commented on pull request #6108: URL: https://github.com/apache/incubator-tvm/pull/6108#issuecomment-663828876 @lsy643 Regarding the thread change, could you please benchmark the performance before and after your change and share the numbers? -