And one more thing that will be nicer in a Reducer than in the instruction selector: you don't have to worry about CanCover :o :o :o
Btw, as far as I can tell, there is no corresponding Intel operations for vaddvq (which I guess is what you want to generate), but I think that it's still better in a reduce than in the ISEL directly. Maybe add a #ifdef V8_TARGET_ARCH_ARM64 around the arm64-specific opcodes that you define. Cheers, Darius On Wednesday, June 5, 2024 at 4:56:56 PM UTC+2 Matthias Liedtke wrote: > Hi, > > I quickly synced with Darius: > 1) In general it makes sense to do the matching on the graph itself (i.e. > in a reducer) assuming this is a generic pattern for which there might also > be specialized / optimized instructions on other architectures. > 2) Intel is working on a re-vectorization pass to replace 128 bit SIMD > operations with 256 bit SIMD operations. So, if these optimized "add + > shuffle" operations exist on intel as well, there would be a clear benefit > in doing it in a reducer that could then potentially run prior to the > revectorization (which would require additional modifications to the > revectorizer). > > In general it's advisable to have as little architecture-specific code > paths in the reducers as possible, so the operations shouldn't be > overfitting to some arm64-only instructions. > Still, having some SIMD operations with clear semantics in the graph that > only exist on some architectures, is fine. > > I don't think the overhead of pattern matching on the graph is likely to > be more effort or slower than pattern matching during instruction selection. > Given the complexity of arm64 and x64 ISel code, I'm happy about anything > that isn't added on top of that. :) > > Cheers, > Matthias > > On Wed, Jun 5, 2024 at 3:59 PM Sam Parker-Haynes <sam.p...@arm.com> wrote: > >> Hi, >> >> I'd like to add some pattern matching, for Turboshaft, to recognise add + >> shuffle patterns which correspond to a horizontal pairwise reduction. I've >> started doing this with wasm::SimdShuffle helpers and then during arm64 >> instruction selection, but it feels like the pattern matching should be >> done in a generic place too... So, I was thinking about adding more four >> more kinds (I32x4, I64x4, F32x4 and F64x2 PairwiseReduction) >> to Simd128UnaryOp and then perform the combining in >> machine-optimization-reducer. >> >> Does this sound reasonable enough..? Or is the overhead of plumbing this >> into the TS IR likely going to be significantly more complicated than >> backend pattern matching? >> >> Thanks, >> Sam >> >> -- >> -- >> v8-dev mailing list >> v8-...@googlegroups.com >> http://groups.google.com/group/v8-dev >> --- >> You received this message because you are subscribed to the Google Groups >> "v8-dev" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to v8-dev+un...@googlegroups.com. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/v8-dev/2a9c3fcd-ee78-4877-9587-2ccb3b0a59e6n%40googlegroups.com >> >> <https://groups.google.com/d/msgid/v8-dev/2a9c3fcd-ee78-4877-9587-2ccb3b0a59e6n%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > -- -- v8-dev mailing list v8-dev@googlegroups.com http://groups.google.com/group/v8-dev --- You received this message because you are subscribed to the Google Groups "v8-dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to v8-dev+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/v8-dev/683a3cb6-4c2f-4a7d-b989-25b9eded793cn%40googlegroups.com.