Hi, I'm looking a lot at shuffles, in turboshaft, and thinking about how we can generate de-interleaving loads. For AArch64 we have the LD1, LD2, LD3 and LD4 instructions which can load in a maximum of 4x128-bit values.
To simplify my life, I'm currently only considering LD2, which will load 2x128-bits, so I figure I the existing Simd256 operations would be good for this. But these are all currently predicated on V8_ENABLE_WASM_SIMD256_REVEC, do they have to be..? I only want to support wide vectors for loads, using Simd256Extract128Lane to produce usable values for arithmetic, etc... but are there any parts of the pipeline that make assumptions when 'revec' is enabled that will make life hard? And when trying to support LD3 (3x128-bit) and LD4 (4x128-bit), would this approach still scale? Thanks, Sam -- -- v8-dev mailing list [email protected] http://groups.google.com/group/v8-dev --- You received this message because you are subscribed to the Google Groups "v8-dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion visit https://groups.google.com/d/msgid/v8-dev/5a86a10c-4b95-4dcb-b7b1-efce05865b97n%40googlegroups.com.
