Re: RFR: 8318650: Optimized subword gather for x86 targets. [v13]

2024-02-26 Thread Jatin Bhateja
On Mon, 26 Feb 2024 09:36:09 GMT, Emanuel Peter wrote: >> 64 bit sub-word SPECIES will either hold 8 bytes values or 4 short values, >> algorithm appropriately handle it. > > Are you saying that the constraints are too relaxed, but currently no outside > algorithm would pass something bad? >

Re: RFR: 8318650: Optimized subword gather for x86 targets. [v13]

2024-02-26 Thread Jatin Bhateja
On Mon, 26 Feb 2024 09:37:33 GMT, Emanuel Peter wrote: >> I'll rereview after > > So xtmp1...3 and rtmp cannot have more descriptive names? These are temporary variable and appropriately named. - PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1502587427

Re: RFR: 8318650: Optimized subword gather for x86 targets. [v13]

2024-02-26 Thread Emanuel Peter
On Tue, 20 Feb 2024 08:29:44 GMT, Emanuel Peter wrote: >> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1716: >> >>> 1714: XMMRegister xtmp3, Register >>> rtmp, >>> 1715: Register midx, Register >>> length,

Re: RFR: 8318650: Optimized subword gather for x86 targets. [v13]

2024-02-26 Thread Emanuel Peter
On Sun, 25 Feb 2024 06:23:50 GMT, Jatin Bhateja wrote: >> src/hotspot/cpu/x86/x86.ad line 4120: >> >>> 4118: BasicType elem_bt = Matcher::vector_element_basic_type(this); >>> 4119: __ lea($tmp$$Register, $mem$$Address); >>> 4120: __ vgather8b(elem_bt, $dst$$XMMRegister,

Re: RFR: 8318650: Optimized subword gather for x86 targets. [v13]

2024-02-24 Thread Jatin Bhateja
On Tue, 20 Feb 2024 08:36:29 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Review comments resolutions. > > src/hotspot/cpu/x86/x86.ad line 4120: > >> 4118: BasicType elem_bt =

Re: RFR: 8318650: Optimized subword gather for x86 targets. [v13]

2024-02-24 Thread Jatin Bhateja
On Tue, 20 Feb 2024 08:04:27 GMT, Emanuel Peter wrote: >> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1584: >> >>> 1582: Label *larr[] = {, , , }; >>> 1583: for (int i = 0; i < 4; i++) { >>> 1584: // dst[i] = mask ? src[index[i]] : 0 >> >> I like these comments a lot! >>

Re: RFR: 8318650: Optimized subword gather for x86 targets. [v13]

2024-02-20 Thread Emanuel Peter
On Wed, 7 Feb 2024 18:38:29 GMT, Jatin Bhateja wrote: >> Hi All, >> >> This patch optimizes sub-word gather operation for x86 targets with AVX2 and >> AVX512 features. >> >> Following is the summary of changes:- >> >> 1) Intrinsify sub-word gather using hybrid algorithm which initially >>

Re: RFR: 8318650: Optimized subword gather for x86 targets. [v13]

2024-02-20 Thread Emanuel Peter
On Tue, 20 Feb 2024 07:35:28 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Review comments resolutions. > > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1584: > >> 1582: Label

Re: RFR: 8318650: Optimized subword gather for x86 targets. [v13]

2024-02-07 Thread Sandhya Viswanathan
On Wed, 7 Feb 2024 18:38:29 GMT, Jatin Bhateja wrote: >> Hi All, >> >> This patch optimizes sub-word gather operation for x86 targets with AVX2 and >> AVX512 features. >> >> Following is the summary of changes:- >> >> 1) Intrinsify sub-word gather using hybrid algorithm which initially >>

Re: RFR: 8318650: Optimized subword gather for x86 targets. [v13]

2024-02-07 Thread Jatin Bhateja
> Hi All, > > This patch optimizes sub-word gather operation for x86 targets with AVX2 and > AVX512 features. > > Following is the summary of changes:- > > 1) Intrinsify sub-word gather using hybrid algorithm which initially > partially unrolls scalar loop to accumulates values from gather