Hi,

I'm looking a lot at shuffles, in turboshaft, and thinking about how we can 
generate de-interleaving loads. For AArch64 we have the LD1, LD2, LD3 and 
LD4 instructions which can load in a maximum of 4x128-bit values.

To simplify my life, I'm currently only considering LD2, which will load 
2x128-bits, so I figure I the existing Simd256 operations would be good for 
this. But these are all currently predicated 
on V8_ENABLE_WASM_SIMD256_REVEC, do they have to be..?

I only want to support wide vectors for loads, using Simd256Extract128Lane 
to produce usable values for arithmetic, etc... but are there any parts of 
the pipeline that make assumptions when 'revec' is enabled that will make 
life hard?

And when trying to support LD3 (3x128-bit) and LD4 (4x128-bit), would this 
approach still scale?

Thanks,
Sam

-- 
-- 
v8-dev mailing list
[email protected]
http://groups.google.com/group/v8-dev
--- 
You received this message because you are subscribed to the Google Groups 
"v8-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/v8-dev/5a86a10c-4b95-4dcb-b7b1-efce05865b97n%40googlegroups.com.

Reply via email to