comaniac opened a new pull request #6686: URL: https://github.com/apache/incubator-tvm/pull/6686
This PR improves the cache read sketch generation rule in auto-scheduler to support multiple cache reads. Previously, cache read sketch generation rule will simply give up if a tensor has multiple consumers. This results in all consumers read that tensor directly from the host memory, which hurts the performance a lot. This PR resolves this limitation. In addition, this PR also fixes the following bugs/issues: - Consider `max_threads_per_block` in cross thread reduction rule condition, so that we can guarantee computation intensive ops won't have cross thread reduction sketch, which usually doesn't help. - Add `.d` suffix to the name of redundant cache read ops to avoid conflict. Without this change, auto-scheduler outputs Python schedule APIs won't work for multiple cache reads. - Add a new constructor of `ComputeDAG` by taking a schedule, so that we can enforce stage order consistency. cc @merrymercy @jcf94 @tqchen ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org