Re: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v3]

2024-01-08 Thread Jatin Bhateja
On Mon, 8 Jan 2024 10:20:33 GMT, Quan Anh Mai wrote: >>> Thanks for the updates! >>> >>> One more idea: Your AVX2 solution has a lot of cost for converting the mask >>> to a permutation. Might it make sense to split this off into a separate >>> vector-node, so that it can float out of a loop

Re: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v3]

2024-01-08 Thread Quan Anh Mai
On Mon, 8 Jan 2024 06:06:22 GMT, Jatin Bhateja wrote: >> Thanks for the updates! >> >> One more idea: Your AVX2 solution has a lot of cost for converting the mask >> to a permutation. Might it make sense to split this off into a separate >> vector-node, so that it can float out of a loop if

Re: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v3]

2024-01-07 Thread Jatin Bhateja
On Fri, 5 Jan 2024 10:02:28 GMT, Emanuel Peter wrote: > Thanks for the updates! > > One more idea: Your AVX2 solution has a lot of cost for converting the mask > to a permutation. Might it make sense to split this off into a separate > vector-node, so that it can float out of a loop if the

Re: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v3]

2024-01-05 Thread Emanuel Peter
On Fri, 5 Jan 2024 07:08:35 GMT, Jatin Bhateja wrote: >> Hi, >> >> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 >> only targets. >> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2 >> instruction set. >> These are very frequently used APIs in

Re: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v3]

2024-01-04 Thread Jatin Bhateja
> Hi, > > Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 only > targets. > Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2 > instruction set. > These are very frequently used APIs in columnar database filter operation. > > Implementation uses a