The meat of this is in the second patch, which makes the AArch64 backend look
for shuffle masks that can be turned into EXT instructions, and updates the
vext[q]_* Neon Intrinsics to use __builtin_shuffle rather than the current
inline assembler; this then produces the same instructions (unless the midend
can do better).
Before that, the first patch adds execution + assembler tests of the existing
intrinsics, which then serve as a testcase for the second patch.
Third patch reuses the test bodies from first patch in equivalent tests on the
ARM architecture.
Ok for trunk?
--Alan