[PATCH] D41521: [CUDA] fixes for __shfl_* intrinsics.

2017-12-21 Thread Artem Belevich via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes. Closed by commit rC321326: [CUDA] More fixes for __shfl_* intrinsics. (authored by tra, committed by ). Changed prior to commit: https://reviews.llvm.org/D41521?vs=127950=127962#toc Repository: rC Clang

[PATCH] D41521: [CUDA] fixes for __shfl_* intrinsics.

2017-12-21 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. Added to my todo list. There are few more gaps that I want to test in order to make sure we don't regress on compatibility with older CUDA versions while changing these wrappers. https://reviews.llvm.org/D41521 ___

[PATCH] D41521: [CUDA] fixes for __shfl_* intrinsics.

2017-12-21 Thread Justin Lebar via Phabricator via cfe-commits
jlebar accepted this revision. jlebar added a comment. This revision is now accepted and ready to land. Since this is tricky and we've seen it affecting user code, do you think it's a bad idea to add tests to the test-suite? https://reviews.llvm.org/D41521

[PATCH] D41521: [CUDA] fixes for __shfl_* intrinsics.

2017-12-21 Thread Artem Belevich via Phabricator via cfe-commits
tra created this revision. tra added a reviewer: jlebar. Herald added a subscriber: sanjoy. - __shfl_{up,down}* uses `unsigned int` for the third parameter. - added [unsigned] long overloads for non-sync shuffles. Augments r319908 which added long overload for sync shuffles.