vinx13 commented on PR #16474:
URL: https://github.com/apache/tvm/pull/16474#issuecomment-1915215166
we can move gpu builder to larger cpu instance if needed
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL
masahi commented on PR #16474:
URL: https://github.com/apache/tvm/pull/16474#issuecomment-1912756112
oof. It's not surprising, since I added a new variant of kernel (flash
decoding) with yet another many explicit template instantiations
https://github.com/tlc-pack/libflash_attn/pull/9
vinx13 commented on PR #16474:
URL: https://github.com/apache/tvm/pull/16474#issuecomment-1912754133
@masahi this is probably the case, it didn't happen for this kernel before
though
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to
masahi commented on PR #16474:
URL: https://github.com/apache/tvm/pull/16474#issuecomment-1912749542
@vinx13 Does this CI failure seem like a compilation timeout
https://ci.tlcpack.ai/blue/organizations/jenkins/tvm-gpu/detail/PR-16474/30/pipeline?
I remember you hit something like this
masahi commented on PR #16474:
URL: https://github.com/apache/tvm/pull/16474#issuecomment-1912677068
> LGTM! When passing paged kv cache, is there any assumption there? e.g.,
layout
Yes, I added shape and dtype requirements as comments.
--
This is an automated message from the
masahi opened a new pull request, #16474:
URL: https://github.com/apache/tvm/pull/16474
Flash attention recently added support for loading from paged KV cache in
https://github.com/Dao-AILab/flash-attention/commit/54e80a3829c6d2337570d01e78ebd9529c02d342.
The support was added to