Re: [PATCH 27/37] target/i386: Use tcg gvec ops for pmovmskb

2022-09-15 Thread Richard Henderson
On 9/14/22 23:59, Paolo Bonzini wrote: On Tue, Sep 13, 2022 at 10:17 AM Richard Henderson wrote: On 9/12/22 00:04, Paolo Bonzini wrote: +while (vec_len > 8) { +vec_len -= 8; +tcg_gen_shli_tl(s->T0, s->T0, 8); +tcg_gen_ld8u_tl(t, cpu_env, offsetof(CPUX86State,

Re: [PATCH 27/37] target/i386: Use tcg gvec ops for pmovmskb

2022-09-14 Thread Paolo Bonzini
On Tue, Sep 13, 2022 at 10:17 AM Richard Henderson wrote: > > On 9/12/22 00:04, Paolo Bonzini wrote: > > +while (vec_len > 8) { > > +vec_len -= 8; > > +tcg_gen_shli_tl(s->T0, s->T0, 8); > > +tcg_gen_ld8u_tl(t, cpu_env, offsetof(CPUX86State, > > xmm_t0.ZMM_B(vec_len -

Re: [PATCH 27/37] target/i386: Use tcg gvec ops for pmovmskb

2022-09-13 Thread Richard Henderson
On 9/12/22 00:04, Paolo Bonzini wrote: +while (vec_len > 8) { +vec_len -= 8; +tcg_gen_shli_tl(s->T0, s->T0, 8); +tcg_gen_ld8u_tl(t, cpu_env, offsetof(CPUX86State, xmm_t0.ZMM_B(vec_len - 1))); +tcg_gen_or_tl(s->T0, s->T0, t); } The shl + or is deposit,

[PATCH 27/37] target/i386: Use tcg gvec ops for pmovmskb

2022-09-11 Thread Paolo Bonzini
From: Richard Henderson As pmovmskb is used by strlen et al, this is the third highest overhead sse operation at %0.8. Signed-off-by: Richard Henderson [Reorganize to generate code for any vector size. - Paolo] Signed-off-by: Paolo Bonzini --- target/i386/tcg/emit.c.inc | 65