On Mon, 8 Jan 2024 10:20:33 GMT, Quan Anh Mai wrote:
>>> Thanks for the updates!
>>>
>>> One more idea: Your AVX2 solution has a lot of cost for converting the mask
>>> to a permutation. Might it make sense to split this off into a separate
>>> vector-node, so that it can float out of a loop
On Mon, 8 Jan 2024 06:06:22 GMT, Jatin Bhateja wrote:
>> Thanks for the updates!
>>
>> One more idea: Your AVX2 solution has a lot of cost for converting the mask
>> to a permutation. Might it make sense to split this off into a separate
>> vector-node, so that it can float out of a loop if
On Fri, 5 Jan 2024 10:02:28 GMT, Emanuel Peter wrote:
> Thanks for the updates!
>
> One more idea: Your AVX2 solution has a lot of cost for converting the mask
> to a permutation. Might it make sense to split this off into a separate
> vector-node, so that it can float out of a loop if the
On Fri, 5 Jan 2024 07:08:35 GMT, Jatin Bhateja wrote:
>> Hi,
>>
>> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2
>> only targets.
>> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2
>> instruction set.
>> These are very frequently used APIs in
> Hi,
>
> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 only
> targets.
> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2
> instruction set.
> These are very frequently used APIs in columnar database filter operation.
>
> Implementation uses a