yongwww commented on pull request #6108:
URL: https://github.com/apache/incubator-tvm/pull/6108#issuecomment-692398165
@lsy643 thanks for sharing the results. What I am wondering is the latency
of your change vs previous nms gpu version (even the output is not identical),
and probably the
yongwww commented on pull request #6108:
URL: https://github.com/apache/incubator-tvm/pull/6108#issuecomment-687288778
@lsy643 you are right, the auxiliary op get_valid_count and strided_slice
are utilized to help handle TensorFlow dynamic NonMaximumSuppression. As a todo
task, the cpu an
yongwww commented on pull request #6108:
URL: https://github.com/apache/incubator-tvm/pull/6108#issuecomment-678908663
@lsy643 the `rearrange_indices_out` part you updated looks good to me.
Currently I am concerned about the thread related change, since the change
might cause some performa
yongwww commented on pull request #6108:
URL: https://github.com/apache/incubator-tvm/pull/6108#issuecomment-663828876
@lsy643 Regarding the thread change, could you please benchmark the
performance before and after your change and share the numbers?
-