ates:
> >
> > foo:movlm(%rip), %eax
> > addl%edi, %eax // 2 bytes
> > subl$1, %eax// 3 bytes
> > cltd
> > idivl %edi
> > ret
> >
> > This discrepancy is caused by the late
3 bytes
> cltd
> idivl %edi
> ret
>
> This discrepancy is caused by the late decision (in peephole2) to split
> an addition with a memory operand, into a load followed by a reg-reg
> addition. This patch improves this situation by adding a peephole2
&g
On 6/29/24 3:07 PM, Vineet Gupta wrote:
On 6/29/24 06:44, Jeff Law wrote:
+;; fclass instruction output bitmap
+;; 0 negative infinity
+;; 1 negative normal number.
+;; 2 negative subnormal number.
+;; 3 -0
+;; 4 +0
+;; 5 positive subnormal number.
+;; 6 positive normal number.
This patch fixes the 4 FAILs of gcc.target/i386/pr192464-vrndscaleph.c
with --target_board='unix{-m32}' on RedHat 7.x. The issue is that this
AVX512 test includes the system math.h, and on older systems this provides
inline versions of floor, ceil and rint (for the 387). The work around
Hello All:
This patch determines unroll factor based on loop register pressure.
Unroll factor is quotient of max of available registers in loop
by number of liveness.
If available registers increases unroll factor increases.
Wherein unroll factor decreases if number of liveness increases.
Loop
Besides VN and copy-prop also CCP and VRP as well as forwprop
propagate out copies and thus it's worthwhile to try to preserve
range and points-to info there when possible.
Note that this also fixes the testcase from PR115701 but that's
because we do not actually intersect info but only copy info
The following restricts copying of points-to info from defs that
might be in regions invoking UB and are never executed.
Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.
PR tree-optimization/115701
* tree-ssanames.cc (maybe_duplicate_ssa_info_at_copy):
Only
The following factors out the code that preserves SSA info of the LHS
of a SSA copy LHS = RHS when LHS is about to be eliminated to RHS.
Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.
PR tree-optimization/115701
* tree-ssanames.h (maybe_duplicate_ssa_info_at_copy):
On Fri, 2024-06-28 at 17:53 -0700, Vineet Gupta wrote:
> + UNSPEC_ISFINITE
> + UNSPEC_ISNORMAL
You don't really need them. The RTL pattern of define_expand has no use
when you expand it via C code and DONE.
i.e. you can just code
(define_expand "isfinite2"
[(match_operand:SI 0
:
__attribute__((noinline))
void test (uint16_t *x, unsigned b, unsigned n)
{
unsigned a = 0;
uint16_t *p = x;
do {
a = *--p;
*p = (uint16_t)(a >= b ? a - b : 0);
} while (--n);
}
Before this patch:
...
.L3:
vle16.v v1,0(a3)
vrsub.vx v5,v2,t1
mvt3,a4
addw a4,a4
Hello All:
This patch determines Unroll factor based on loop register pressure.
Unroll factor is quotient of max of available registers in loop
by number of liveness.
If available registers increases unroll factor increases.
Wherein unroll factor decreases if number of liveness increases.
Loop
r could be improved, or maybe output produced from
`print_rtx_function' isn't right, I don't know.
> The patch is OK for trunk, thanks. I agree that it's a regression
> from 08a692679fb8. Since it's fixing such a hard-to-diagnose wrong
> code bug, and since it seems very safe, I think it
gcc/
* dwarf2codeview.cc (write_lf_modifier): Expand upon comment.
---
gcc/dwarf2codeview.cc | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/gcc/dwarf2codeview.cc b/gcc/dwarf2codeview.cc
index 5a33b439b14..df53d8bab9d 100644
--- a/gcc/dwarf2codeview.cc
+++
CodeView symbols have to be multiples of four bytes; add an alignment
directive to write_data_symbol to ensure this.
Note that these can be zeroes, so we can rely on GAS to do this for us;
it's only types that need f3, f2, f1 values.
gcc/
* dwarf2codeview.cc (write_data_symbol):
Adds names for the padding magic numbers to enum cv_leaf_type.
gcc/
* dwarf2codeview.cc (enum cv_leaf_type): Add padding constants.
(write_cv_padding): Use names for padding constants.
---
gcc/dwarf2codeview.cc | 11 +++
1 file changed, 7 insertions(+), 4
Make everything more gdb-friendly by using an enum for symbol constants
rather than #defines.
gcc/
* dwarf2codeview.cc (S_LDATA32, S_GDATA32, S_COMPILE3): Undefine.
(enum cv_sym_type): Define.
(struct codeview_symbol): Use enum cv_sym_type.
Make everything more gdb-friendly by using an enum for type constants
rather than #defines.
gcc/
* dwarf2codeview.cc (enum cv_leaf_type): Define.
(struct codeview_subtype): Use enum cv_leaf_type.
(struct codeview_custom_type): Use enum cv_leaf_type.
On 6/29/24 06:44, Jeff Law wrote:
>> +;; fclass instruction output bitmap
>> +;; 0 negative infinity
>> +;; 1 negative normal number.
>> +;; 2 negative subnormal number.
>> +;; 3 -0
>> +;; 4 +0
>> +;; 5 positive subnormal number.
>> +;; 6 positive normal number.
>> +;; 7 positive
Probably not entirely fool-proof when using statement
expressions in initializers, but should be good enough.
Bootstrapped and regression tested on x86_64.
c: Diagnose declarations that are used only in their own initializer
[PR115027]
Track the declaration that is currently
addition. This patch improves this situation by adding a peephole2
to recognized consecutive additions and transform them into lea if
profitable.
My first attempt at fixing this was to use a define_insn_and_split:
(define_insn_and_split "*lea3_reg_mem_imm"
[(set (match_opera
This adds missing code for handling error marks.
Bootstrapped and regression tested on x86_64.
c: Fix ICE for incorrect code in comptypes_verify [PR115696]
The new verification code produces an ICE for incorrect code. Add the
same logic as already used in comptypes to to
This fixes an ICE when redeclaring a struct and having
an aligned attribute in one version in C23.
Bootstrapped and regression tested on x86_64.
c: Fix ICE for redeclaration of structs with different alignment [PR114727]
For redeclarations of struct in C23, if one has an
On 6/27/24 3:56 PM, Palmer Dabbelt wrote:
This is really more of a question than a patch.
Looking at PR/115687 I managed to convince myself there's a general
class of problems here: splitting might produce constant subexpressions,
but as far as I can tell there's nothing to eliminate those
using
nullptr so I think it's fine.
I haven't reviewed the patch yet, but this answers the nullptr question:
https://en.cppreference.com/w/cpp/named_req/NullablePointer
(aka Cpp17NullablePointer in the C++ standard).
diff --git a/libstdc++-v3/include/bits/hashtable.h
b/libstdc++-v3/includ
On 6/28/24 6:53 PM, Vineet Gupta wrote:
Currently isfinite and isnormal use float compare instructions with fp
flags save/restored around them. Our perf team complained this could be
costly in uarch. RV Base ISA already has FCLASS.{d,s,h} instruction to
do FP compares w/o disturbing FP
re that the programmer didn't make any
mistakes. This warning catches the bug above, so that the programmer
will be able to fix it and write:
char log_levels[][8] = { "info", "warning", "err" };
This warning already existed as part of -Wc++-compat, but this patch
allo
On Sat, Jun 29, 2024 at 02:58:48PM GMT, Alejandro Colomar wrote:
> On Sat, Jun 29, 2024 at 02:52:40PM GMT, Alejandro Colomar wrote:
> > @@ -6450,6 +6452,8 @@ name is still supported, but the newer name is more
> > descriptive.)
> > -Wstring-compare
> > -Wtype-limits
> > -Wuninitialized
> >
fix it and write:
>
> char log_levels[][8] = { "info", "warning", "err" };
>
> This warning already existed as part of -Wc++-compat, but this patch
> allows enabling it separately. It is also included in -Wextra, since
> it may not always
re that the programmer didn't make any
mistakes. This warning catches the bug above, so that the programmer
will be able to fix it and write:
char log_levels[][8] = { "info", "warning", "err" };
This warning already existed as part of -Wc++-compat, but this patch
allo
This marks structures which include a byte array
as typeless storage.
Bootstrapped and regression tested on x86_64.
c: Add support for byte arrays in C2Y
To get correct aliasing behavior requires that structures and unions
that contain a byte array, i.e. an array of
; vars) - later today.
Fixed as attached,
Iain
0001-jit-Fix-Darwin-bootstrap-after-r15-1699.patch
Description: Binary data
On Fri, Jun 28, 2024 at 11:46:08AM +0200, Georg-Johann Lay wrote:
> Am 27.06.24 um 10:51 schrieb Stefan Schulze Frielinghaus:
> > On Thu, Jun 27, 2024 at 09:45:32AM +0200, Georg-Johann Lay wrote:
> > > Am 24.05.24 um 11:13 Am 25.06.24 um 16:03 schrieb Paul Koning:
> > > > > On Jun 24, 2024, at
return std::isfinite (x);
}
generating the new seq
.LFB4:
fclass.d a0,fa0
andi a0,a0,126
snez a0,a0
ret
vs.
li a0,1
ret
I have a hunch this requires the pending value range patch from Hao Chen
GUI.
Thx,
-Vineet
[1] https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653094.html
Hi Folks,
> On 28 Jun 2024, at 12:50, Rainer Orth wrote:
>
> David Malcolm writes:
>
>> On Thu, 2024-04-04 at 18:59 -0400, Antoni Boucher wrote:
>>> Hi.
>>> This patch adds a new API to produce an rvalue representing the
>>> alignment of a type.
+1 to any change that reduces the number of fflags accesses.
On Fri, Jun 28, 2024 at 5:54 PM Vineet Gupta wrote:
>
> Currently isfinite and isnormal use float compare instructions with fp
> flags save/restored around them. Our perf team complained this could be
> costly in uarch. RV Base ISA
Currently isfinite and isnormal use float compare instructions with fp
flags save/restored around them. Our perf team complained this could be
costly in uarch. RV Base ISA already has FCLASS.{d,s,h} instruction to
do FP compares w/o disturbing FP exception flags.
Coincidently, upstream ijust few
On Fri, Jun 28, 2024 at 10:00:53PM +0200, Harald Anlauf wrote:
>
> the attached patch fixes an ICE occuring for ALLOCATE with SOURCE
> (or MOLD) of deferred character length in the scalar case, which
> looked obscure because the ICE disappears at -O1 and higher.
>
> The
Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
-- >8 --
This DR (https://cplusplus.github.io/CWG/issues/2627.html) says that
even if we are converting from an integer type or unscoped enumeration type
to an integer type that cannot represent all the values of the original
type, it's
urns an expression
> without TREE_SIDE_EFFECTS, which can't be if the involved type is volatile.
>
> This patch relaxes the assert to accept having TREE_THIS_VOLATILE on the
> returned expression.
>
> Successfully tested on x86_64-pc-linux-gnu.
>
> PR c++/60
>
>
}
> === cut here ===
>
> The problem is that get_fndecl_argument_location assumes that it has a
> FUNCTION_DECL in its hands to find the location of the bad argument. It might
> however have a TEMPLATE_DECL if there's a single candidate that cannot be
> instantiated, like here.
>
&
Dear all,
the attached patch fixes an ICE occuring for ALLOCATE with SOURCE
(or MOLD) of deferred character length in the scalar case, which
looked obscure because the ICE disappears at -O1 and higher.
The dump tree suggests that it is a wrong decl for the temporary
source that was e.g
Pushed to trunk.
On Thu, 27 Jun 2024 at 10:01, Jonathan Wakely wrote:
>
> As I commented in the PR, I think it would be nice if the compiler
> accepted C++11 alignof in C++98 mode when -faligned-new is used. But
> even if G++ added that, we'd need Clang to use it, and then wait a few
> releases
Pushed to trunk.
On Thu, 27 Jun 2024 at 10:03, Jonathan Wakely wrote:
>
> I'm planning to push this, although arguably the first change isn't
> worth doing if we can't use it everywhere. If we need to keep the old
> code for EDG, maybe we should just keep using that? The new version
> probably
> > On 6/28/24 6:18 AM, Pengxuan Zheng wrote:
> > > This patch improves GCC’s vectorization of __builtin_popcount for
> > > aarch64 target by adding popcount patterns for vector modes besides
> > > QImode, i.e., HImode, SImode and DImode.
> > >
> >
This patch improves GCC’s vectorization of __builtin_popcount for aarch64 target
by adding popcount patterns for vector modes besides QImode, i.e., HImode,
SImode and DImode.
With this patch, we now generate the following for V8HI:
cnt v1.16b, v0.16b
uaddlp v2.8h, v1.16b
For V4HI, we
> On 6/28/24 6:18 AM, Pengxuan Zheng wrote:
> > This patch improves GCC’s vectorization of __builtin_popcount for
> > aarch64 target by adding popcount patterns for vector modes besides
> > QImode, i.e., HImode, SImode and DImode.
> >
> > With this patch, we no
This patch improves GCC’s vectorization of __builtin_popcount for aarch64 target
by adding popcount patterns for vector modes besides QImode, i.e., HImode,
SImode and DImode.
With this patch, we now generate the following for V8HI:
cnt v1.16b, v0.16b
uaddlp v2.8h, v1.16b
For V4HI, we
Please ignore this patch. I accidently added unrelated changes. I'll push a
correct version shortly.
Sorry for the noise.
Thanks,
Pengxuan
> This patch improves GCC’s vectorization of __builtin_popcount for aarch64
> target by adding popcount patterns for vector modes besides QImod
This patch improves GCC’s vectorization of __builtin_popcount for aarch64 target
by adding popcount patterns for vector modes besides QImode, i.e., HImode,
SImode and DImode.
With this patch, we now generate the following for V8HI:
cnt v1.16b, v0.16b
uaddlp v2.8h, v1.16b
For V4HI, we
Richard Sandiford writes:
> Thomas Schwinge writes:
>> Hi!
>>
>> On 2024-06-27T23:20:18+0200, I wrote:
>>> On 2024-06-27T22:27:21+0200, I wrote:
>>>> On 2024-06-27T18:49:17+0200, I wrote:
>>>>> On 2023-10-24T19:49:10+0100, Richard Sandi
Remove extra assignment, extra temp variable and variable shadowing.
No functional changes intended.
gcc/ChangeLog:
* config/i386/i386-expand.cc (ix86_expand_move): Remove extra
assignment to tmp variable, reuse tmp variable instead of
declaring new temporary variable and remove tmp
On 27/06/2024 17:16, Wilco Dijkstra wrote:
> Hi Richard,
>
>> Doing just this will mean that the register allocator will have to undo a
>> pre/post memory operand that was accepted by the predicate (memory_operand).
>> I think we really need a tighter predicate (lets call it noautoinc_mem_op)
rivai.ai; kito.ch...@gmail.com;
jeffreya...@gmail.com; rdapp@gmail.com
Subject: RE: [PATCH v3] Vect: Support truncate after .SAT_SUB pattern in zip
> -Original Message-
> From: Richard Biener
> Sent: Friday, June 28, 2024 6:39 AM
> To: Li, Pan2
> Cc: gcc-patch
.ch...@gmail.com;
jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v1] Match: Support imm form for unsigned scalar .SAT_ADD
On Fri, Jun 28, 2024 at 5:44 AM wrote:
>
> From: Pan Li
>
> This patch would like to support the form of unsigned scalar .SAT_ADD
> when one of the op is
The following starts to handle NULL elements in SLP_TREE_SCALAR_STMTS
with the first candidate being the two-operator nodes where some
lanes are do-not-care and also do not have a scalar stmt computing
the result. I've sofar whack-a-moled the vect.exp testsuite.
I do plan to use NULL elements
The following makes sure that for a SLP reductions all lanes have
the same STMT_VINFO_REDUC_IDX. Once we move that info and can adjust
it we can implement swapping. It also makes the existing protection
against operand swapping trigger for all stmts participating in a
reduction, not just the
Thomas Schwinge writes:
> Hi!
>
> On 2024-06-27T23:20:18+0200, I wrote:
>> On 2024-06-27T22:27:21+0200, I wrote:
>>> On 2024-06-27T18:49:17+0200, I wrote:
>>>> On 2023-10-24T19:49:10+0100, Richard Sandiford
>>>> wrote:
>>>>>
> -Original Message-
> From: Richard Biener
> Sent: Friday, June 28, 2024 6:39 AM
> To: Li, Pan2
> Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com;
> jeffreya...@gmail.com; rdapp@gmail.com; Tamar Christina
>
> Subject: Re: [PATCH v3
On 6/28/24 13:55, Richard Biener wrote:
On Fri, Jun 28, 2024 at 8:43 AM Jørgen Kvalsvik wrote:
Using auto_vec rather than vec for means the vectors are release
automatically upon return, to stop the leak. The problem seems is that
auto_vec is not really move-aware, only the specialization
On Fri, Jun 28, 2024 at 5:44 AM wrote:
>
> From: Pan Li
>
> This patch would like to support the form of unsigned scalar .SAT_ADD
> when one of the op is IMM. For example as below:
>
> Form IMM:
> #define DEF_SAT_U_ADD_IMM_FMT_1(T) \
>
On Wed, Jun 26, 2024 at 4:50 PM Feng Xue OS wrote:
>
> Updated the patch.
>
> For lane-reducing operation(dot-prod/widen-sum/sad) in loop reduction, current
> vectorizer could only handle the pattern if the reduction chain does not
> contain other operation, no matter the other
Hi!
As part of this:
On 2013-07-26T11:04:33-0400, David Malcolm wrote:
> This patch is the hand-written part of the conversion of passes from
> C structs to C++ classes.
> --- a/gcc/passes.c
> +++ b/gcc/passes.c
..., we did hard-code 'PUSH_INSERT_PASSES_WITHIN(PASS)' to
on has its own input vectype, while reduction
> - PHI records the input vectype with least lanes. */
> - if (lane_reducing)
> -STMT_VINFO_REDUC_VECTYPE_IN (stmt_info) = vectype_in;
>
>enum vect_reduction_type reduction_type = STMT_VINFO_REDUC_TYPE (phi_info);
>ST
On Fri, Jun 28, 2024 at 2:14 PM Thomas Schwinge wrote:
>
> Hi!
>
> On 2013-07-26T11:04:32-0400, David Malcolm wrote:
> > Introduce a new gen-pass-instances.awk script, and use it at build time
> > to make a pass-instances.def from passes.def.
>
> (The script has later been rewritten and
在 2024/6/28 下午8:35, Xi Ruoyao 写道:
On Fri, 2024-06-28 at 20:34 +0800, chenglulu wrote:
在 2024/6/28 下午8:25, Xi Ruoyao 写道:
Hi Richard,
The late combine pass has triggered some FAILs on LoongArch and I'm
investigating. One of them is movcf2gr-via-fr.c. In
315r.postreload:
(insn 22 7 24 2
On Fri, 2024-06-28 at 20:34 +0800, chenglulu wrote:
>
> 在 2024/6/28 下午8:25, Xi Ruoyao 写道:
> > Hi Richard,
> >
> > The late combine pass has triggered some FAILs on LoongArch and I'm
> > investigating. One of them is movcf2gr-via-fr.c. In
> > 315r.postreload:
> >
> > (insn 22 7 24 2 (set
在 2024/6/28 下午8:25, Xi Ruoyao 写道:
Hi Richard,
The late combine pass has triggered some FAILs on LoongArch and I'm
investigating. One of them is movcf2gr-via-fr.c. In 315r.postreload:
(insn 22 7 24 2 (set (reg:FCC 32 $f0 [87])
(reg:FCC 64 $fcc0 [87]))
Richard Biener writes:
> On Fri, Jun 28, 2024 at 2:16 PM Richard Biener
> wrote:
>>
>> On Fri, Jun 28, 2024 at 11:06 AM Richard Biener
>> wrote:
>> >
>> >
>> >
>> > > Am 28.06.2024 um 10:27 schrieb Richard Sandiford
>> > > :
>> > >
>> > > Richard Biener writes:
>> > >>> On Fri, Jun 28, 2024
Hi Richard,
The late combine pass has triggered some FAILs on LoongArch and I'm
investigating. One of them is movcf2gr-via-fr.c. In 315r.postreload:
(insn 22 7 24 2 (set (reg:FCC 32 $f0 [87])
(reg:FCC 64 $fcc0 [87]))
"../gcc/gcc/testsuite/gcc.target/loongarch/movcf2gr-via-fr.c":9:12
On Fri, Jun 28, 2024 at 2:16 PM Richard Biener
wrote:
>
> On Fri, Jun 28, 2024 at 11:06 AM Richard Biener
> wrote:
> >
> >
> >
> > > Am 28.06.2024 um 10:27 schrieb Richard Sandiford
> > > :
> > >
> > > Richard Biener writes:
> > >>> On Fri, Jun 28, 2024 at 8:01 AM Richard Biener
> > >>>
On Fri, Jun 28, 2024 at 11:06 AM Richard Biener
wrote:
>
>
>
> > Am 28.06.2024 um 10:27 schrieb Richard Sandiford
> > :
> >
> > Richard Biener writes:
> >>> On Fri, Jun 28, 2024 at 8:01 AM Richard Biener
> >>> wrote:
> >>>
> >>> On Fri, Jun 28, 2024 at 3:15 AM liuhongt wrote:
>
>
of 'gcc/passes.def'", see
attached?
Grüße
Thomas
>From 072cdf7d9cf86fb2b0553b93365648e153b4376b Mon Sep 17 00:00:00 2001
From: Thomas Schwinge
Date: Fri, 28 Jun 2024 14:05:04 +0200
Subject: [PATCH] Rewrite usage comment at the top of 'gcc/passes.def'
Since Subversion r201359 (Git c
On Fri, Jun 28, 2024 at 8:43 AM Jørgen Kvalsvik wrote:
>
> Using auto_vec rather than vec for means the vectors are release
> automatically upon return, to stop the leak. The problem seems is that
> auto_vec is not really move-aware, only the specialization
> is.
Indeed.
> This is actually
David Malcolm writes:
> On Thu, 2024-04-04 at 18:59 -0400, Antoni Boucher wrote:
>> Hi.
>> This patch adds a new API to produce an rvalue representing the
>> alignment of a type.
>> Thanks for the review.
>
> Patch looks good to me (but may need the usual A
t;
> > Thanks,
> > Uros.
>
> It looks like the patch resolves 3 reported issues.
> Uros, I suggest merging the patch as it is, without minor refactoring, to
> avoid triggering another round of testing, if you agree.
Yes, please go ahead.
Thanks,
Uros.
The following addresses the corner case of an outer loop with an empty
header where we end up asking for the BB of a NULL stmt by
special-casing this case.
Bootstrap and regtest running on x86_64-unknown-linux-gnu, the patch
fixes observed ICEs on GCN.
PR tree-optimization/115652
Thursday, June 27, 2024 8:13 PM
Uros Bizjak wrote:
>
> So, there is no problem having #endif just after else.
>
> Anyway, it's your call, this is not a hill I'm willing to die on. ;)
>
> Thanks,
> Uros.
It looks like the patch resolves 3 reported issues.
Uros, I sugg
Hi,
> -Original Message-
> From: Palmer Dabbelt
> Sent: Thursday, June 27, 2024 10:57 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Palmer Dabbelt
> Subject: [RFC PATCH] cse: Add another CSE pass after split1
>
> This is really more of a question than a patch.
>
On 2024-06-09 04:36 Jeff Law wrote:
>
>
>
>On 6/5/24 8:42 PM, Fei Gao wrote:
>
>>> But let's back up and get a good explanation of what the problem is.
>>> Based on patch 2/2 it looks like we have lost an assignment to the
>>> return register.
>
pattern is the same
> > > > in both cases anyway. This would prevent special-casing from being
> > > > needed
> > > > in `mips_expand_conditional_trap' as well.
> > > >
> > >
> > > I agree. The
When unified shared memory is required, the default memory space should also be
unified.
libgomp/ChangeLog:
* config/linux/allocator.c (linux_memspace_alloc): Check
omp_requires_mask.
(linux_memspace_calloc): Likewise.
(linux_memspace_free): Likewise.
d for avoiding page migrations (in general).
This implementation reuses the "usmpin" allocator (introduced in my previous
patch-set to optimize pinned memory allocation) to solve these issues.
Firstly, all USM memory is allocated from specially memmap'd pages to ensure
that as few pages as po
From: Marcel Vollweiler
This patch handles Unified Shared Memory (USM) in the OpenMP runtime routine
omp_target_is_accessible.
libgomp/ChangeLog:
* target.c (omp_target_is_accessible): Handle unified shared memory.
* testsuite/libgomp.c-c++-common/target-is-accessible-1.c
From: Andrew Stubbs
The AMD GCN runtime must be set to the correct mode for Unified Shared Memory
to work, but this is not always clear at compile and link time due to the split
nature of the offload compilation pipeline.
This patch sets a new attribute on OpenMP offload functions to ensure
From: Hafiz Abid Qadeer
This patches changes calls to malloc/free/calloc/realloc and operator new to
memory allocation functions in libgomp with
allocator=ompx_unified_shared_mem_alloc. This helps existing code to benefit
from the unified shared memory, and is necessary to implement "requires
From: Andrew Stubbs
This adds support for using Cuda Managed Memory with omp_alloc. It will be
used as the underpinnings for "requires unified_shared_memory" in a later
patch.
There are two new predefined allocators, ompx_gnu_unified_shared_mem_alloc and
ompx_gnu_host_mem_a
From: Andrew Stubbs
Ensure that "requires unified_shared_memory" plays nicely with the
-foffload-memory options, and that enabling the option has the same effect as
enabling USM in the code.
Also adds some testcases.
gcc/c/ChangeLog:
* c-parser.cc (c_parser_omp_target): Add
These patched are an evolution of the USM portion of the patches previously
posted in July 2022 (yes, it's taken a while!)
https://patchwork.sourceware.org/project/gcc/list/?series=10748=%2A=both
The pinned memory portion was already posted (and partially approved
already) and must be applied
that I will implement in later patches. There
may be a temporary regression in USM support.
This patch disables the basic stop-gap shared memory so we can introduce
fast Unified Shared Memory using the managed memory APIs in the next patches.
If a device has integrated memory then the patch attempts
Am 27.06.24 um 10:51 schrieb Stefan Schulze Frielinghaus:
On Thu, Jun 27, 2024 at 09:45:32AM +0200, Georg-Johann Lay wrote:
Am 24.05.24 um 11:13 Am 25.06.24 um 16:03 schrieb Paul Koning:
On Jun 24, 2024, at 1:50 AM, Stefan Schulze Frielinghaus
wrote:
On Mon, Jun 10, 2024 at 07:19:19AM +0200,
ECL in its hands to find the location of the bad argument. It might
however have a TEMPLATE_DECL if there's a single candidate that cannot be
instantiated, like here.
This patch simply defaults to using the FNDECL's location in this case, which
fixes this PR.
Successfully tested on x86_64-pc
Now that the dust has settled on the prange work, we can remove the
hybrid operators. I will push this once tests complete.
gcc/ChangeLog:
* range-op-ptr.cc (class hybrid_and_operator): Remove.
(class hybrid_or_operator): Same.
(class hybrid_min_operator): Same.
Hi Paul,
thanks for the review. I have removed the commented assert and committed as
gcc-15-1704-gaa3599a10ca
What about your pr59104 patch? It is living happily in my dev-branch and making
no problems.
Thanks again and regards,
Andre
On Thu, 27 Jun 2024 07:29:40 +0100
Paul Richard
On Fri, 28 Jun 2024 at 07:53, Maciej Cencora wrote:
>
> But constexpr-ness of bit_cast has additional limitations and e.g. providing
> an union as _Tp would be a hard-error. So we have two options:
> - before bitcasting check if type can be bitcast-ed at compile-time,
> - change the 'if
> Am 28.06.2024 um 10:27 schrieb Richard Sandiford :
>
> Richard Biener writes:
>>> On Fri, Jun 28, 2024 at 8:01 AM Richard Biener
>>> wrote:
>>>
>>> On Fri, Jun 28, 2024 at 3:15 AM liuhongt wrote:
for the testcase in the PR115406, here is part of the dump.
char
Richard Biener writes:
> On Fri, Jun 28, 2024 at 8:01 AM Richard Biener
> wrote:
>>
>> On Fri, Jun 28, 2024 at 3:15 AM liuhongt wrote:
>> >
>> > for the testcase in the PR115406, here is part of the dump.
>> >
>> > char D.4882;
>> > vector(1) _1;
>> > vector(1) signed char _2;
>> >
With some recent optimization, -O1/-O2/-O3 can archive almost same
performace/size by stack load/store. Thus lwm/swm will save/store
less callee-saved register. In fact only $16 is saved with swm.
To be sure that this optimization does exist, let's add 2 more
function calls. So that lwm/swm
Hi,
On Thu, 2024-06-27 at 14:56 -0700, Palmer Dabbelt wrote:
> This is really more of a question than a patch.
>
> Looking at PR/115687 I managed to convince myself there's a general
> class of problems here: splitting might produce constant subexpressions,
> but as far as I c
gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>; Richard Earnshaw
mailto:richard.earns...@arm.com>>;
Richard Sandiford mailto:richard.sandif...@arm.com>>
Subject: Re: [PATCH] aarch64: Remove RNG and MTE from -mcpu=neoverse-v2
Hi Tamar,
Thanks for going through the docs h
rm broken currently, and the patch
is definitely an improvement there.
> That said, maybe we're fine with this but then I walk back and say
> just unconditionally include sys/types.h ...
It is included unconditionally in other headers, yes.
> Should be Davids say as he added this API.
Agr
501 - 600 of 246018 matches
Mail list logo