ni...@lysator.liu.se (Niels Möller) writes:
Found the vmlal instruction now. Makes for a cute loop,
.Loop:
vld1.32 l01[1], [vp]!
vld1.32 {u00[]}, [up]!
vaddl.u32 q1, l01, c01
vmlal.u32 q1, u00, v01 C q1 overlaps with c01
Torbjorn Granlund t...@gmplib.org writes:
I suspect that some scheduling just might improve performance by a large
factor...
One could hope so. I'm still a bit skeptic to mul and accumulate, at
least for addmul_2, since it's seems like a good idea to schedule the
multiplications far in
ni...@lysator.liu.se (Niels Möller) writes:
One could hope so. I'm still a bit skeptic to mul and accumulate, at
least for addmul_2, since it's seems like a good idea to schedule the
multiplications far in advance.
Multiply-accumulate can have the drawback to put multiplication in the