On Tue, 16 Jan 2024 06:08:28 GMT, Jatin Bhateja wrote:
>> Or would that require too many registers?
>
>> Can the `offset` not be added to `idx_base` before the loop?
>
> Offset needs to be added to each index element, please refer to API
> specification for details.
>
On Tue, 16 Jan 2024 06:08:35 GMT, Jatin Bhateja wrote:
>> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1634:
>>
>>> 1632: Register offset,
>>> XMMRegister offset_vec, XMMRegister idx_vec,
>>> 1633:
On Tue, 16 Jan 2024 06:08:40 GMT, Jatin Bhateja wrote:
>> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1627:
>>
>>> 1625: vpsrlvd(dst, dst, xtmp, vlen_enc);
>>> 1626: // Pack double word vector into byte vector.
>>> 1627: vpackI2X(T_BYTE, dst, ones, xtmp, vlen_enc);
>>
>> I
On Tue, 16 Jan 2024 06:08:31 GMT, Jatin Bhateja wrote:
>> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1757:
>>
>>> 1755: for (int i = 0; i < 4; i++) {
>>> 1756: movl(rtmp, Address(idx_base, i * 4));
>>> 1757: pinsrw(dst, Address(base, rtmp, Address::times_2), i);
>>
>>
On Tue, 16 Jan 2024 06:17:43 GMT, Jatin Bhateja wrote:
>> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1900:
>>
>>> 1898: vgather8b(elem_ty, xtmp3, base, idx_base, rtmp, vlen_enc);
>>> 1899: } else {
>>> 1900: LP64_ONLY(vgather8b_masked(elem_ty, xtmp3, base, idx_base,
>>>
On Mon, 15 Jan 2024 14:27:43 GMT, Emanuel Peter wrote:
>> Jatin Bhateja has updated the pull request with a new target base due to a
>> merge or a rebase. The pull request now contains 12 commits:
>>
>> - Accelerating masked sub-word gathers for AVX2 targets, this gives
>> additional 1.5-4x
On Mon, 15 Jan 2024 14:36:38 GMT, Emanuel Peter wrote:
>> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1776:
>>
>>> 1774: for (int i = 0; i < 4; i++) {
>>> 1775: movl(rtmp, Address(idx_base, i * 4));
>>> 1776: addl(rtmp, offset);
>>
>> Can the `offset` not be added to
On Mon, 15 Jan 2024 13:49:06 GMT, Emanuel Peter wrote:
>> Jatin Bhateja has updated the pull request with a new target base due to a
>> merge or a rebase. The pull request now contains 12 commits:
>>
>> - Accelerating masked sub-word gathers for AVX2 targets, this gives
>> additional 1.5-4x
On Mon, 15 Jan 2024 14:25:28 GMT, Emanuel Peter wrote:
>> Jatin Bhateja has updated the pull request with a new target base due to a
>> merge or a rebase. The pull request now contains 12 commits:
>>
>> - Accelerating masked sub-word gathers for AVX2 targets, this gives
>> additional 1.5-4x
On Mon, 1 Jan 2024 14:36:06 GMT, Jatin Bhateja wrote:
>> Hi All,
>>
>> This patch optimizes sub-word gather operation for x86 targets with AVX2 and
>> AVX512 features.
>>
>> Following is the summary of changes:-
>>
>> 1) Intrinsify sub-word gather using hybrid algorithm which initially
>>
> Hi All,
>
> This patch optimizes sub-word gather operation for x86 targets with AVX2 and
> AVX512 features.
>
> Following is the summary of changes:-
>
> 1) Intrinsify sub-word gather using hybrid algorithm which initially
> partially unrolls scalar loop to accumulates values from gather
11 matches
Mail list logo