https://gcc.gnu.org/bugzilla/show_bug.cgi?id=39906
Thiago Macieira changed:
What|Removed |Added
CC||thiago at kde dot org
--- Comment #4 f
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117052
--- Comment #3 from Thiago Macieira ---
(In reply to Andrew Pinski from comment #2)
> I am 99% sure there is a dup of this bug.
Yup, that's the same. It didn't show up in the suggested list from bugzilla.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117052
--- Comment #1 from Thiago Macieira ---
MSVC bug report:
https://developercommunity.visualstudio.com/t/MSVC-accepts-declaring-an-instantiation/10764801?port=1025&fsid=6d3b3a58-e142-4839-8dbf-ef5bd094b326
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117052
Bug ID: 117052
Summary: GCC accepts declaring an instantiation of member
template in the wrong scope
Product: gcc
Version: 15.0
Status: UNCONFIRMED
Severity: n
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116854
--- Comment #18 from Thiago Macieira ---
(In reply to Uroš Bizjak from comment #16)
> --quote--
> Note, that clearing the RDRAND CPUID bit does not prevent a processor
> that normally supports the RDRAND instruction from executing it. So any
> c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116854
--- Comment #17 from Thiago Macieira ---
(In reply to comment #8)
> I have to disagree. I specifically stated in the Qt bug that affected users
> were using -march=native and that was being resolved to -march=bdver4, so
> everything is not fine
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116854
--- Comment #6 from Thiago Macieira ---
(In reply to Thiago Macieira from comment #5)
> The argument was that -march=bdver4 should not imply -mrdrnd, the same way
> that we had to fix -march=westmere to -march=haswell not to imply -maes: not
> a
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116854
--- Comment #5 from Thiago Macieira ---
(In reply to Andrew Pinski from comment #4)
> > Since the BIOS and/or OS can disable it,
>
> From the way I understand it, even things like avx can be turned on/off too.
> Does that mean gcc should disabl
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116854
--- Comment #3 from Thiago Macieira ---
(In reply to Andrew Pinski from comment #2)
> So bdver4 does have RDRND support just buggy bios's cause linux to disable
> it:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95405
--- Comment #11 from Thiago Macieira ---
May also be related to why GCC produces warnings about uninitialised memory -
Bug 100115
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115619
--- Comment #1 from Thiago Macieira ---
Matching Clang bug report: https://github.com/llvm/llvm-project/issues/96512
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115619
Bug ID: 115619
Summary: [ASAN] new-delete-type-mismatch on aligned operator
new
Product: gcc
Version: 14.1.0
Status: UNCONFIRMED
Severity: normal
Pri
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114576
--- Comment #4 from Thiago Macieira ---
(In reply to Jakub Jelinek from comment #3)
> vaesenc etc. instructions can be used even if just -maes -mavx, not just
> -mvaes -mavx512vl.
Correct, that's just VEX-prefixed AESNI instructions.
VAES adde
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114576
Bug ID: 114576
Summary: [13 regression][config/i386] GCC 14/trunk emits
VEX-prefixed AES instruction without AVX enabled
Product: gcc
Version: 14.0
Status: UNCONFIRMED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114088
--- Comment #3 from Thiago Macieira ---
> But __builtin_strlen *does* get optimized when the input is a string literal.
> Not sure about wcslen though.
It appears not to, in the test above. std::char_trait::length() calls
wcslen() whereas the
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114088
Bug ID: 114088
Summary: Please provide __builtin_c16slen and __builtin_c32slen
to complement __builtin_wcslenw
Product: gcc
Version: unknown
Status: UNCONFIRMED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113465
--- Comment #6 from Thiago Macieira ---
Mind if I ask you reconsider the decision for inline variables (which all
constexpr ones are)?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54483
--- Comment #13 from Thiago Macieira ---
(In reply to Andrew Pinski from comment #11)
> You still need:
> constexpr float A::val;
In C++11 mode, yes.
C++17 made all static constexpr data members implicitly inline, which change
the situation. In
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113465
--- Comment #5 from Thiago Macieira ---
> I don't think that's the same. That situation over there is C++11, where the
> constexpr variable is *not* static.
I meant not *inline*.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113465
--- Comment #4 from Thiago Macieira ---
(In reply to Andrew Pinski from comment #3)
> See PR 54483 .
>
> *** This bug has been marked as a duplicate of bug 54483 ***
I don't think that's the same. That situation over there is C++11, where the
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113465
Bug ID: 113465
Summary: [mingw-w64] dllexported constexpr (inline) variables
not automatically emitted
Product: gcc
Version: 13.2.1
Status: UNCONFIRMED
Severit
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111244
--- Comment #7 from Thiago Macieira ---
(In reply to Costas Argyris from comment #6)
> At this point I just meant embedding it in your example a.out executable
> file, just to check if it will work correctly.
Ah, got it. But that is not the con
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111244
--- Comment #5 from Thiago Macieira ---
(In reply to Jonathan Wakely from comment #3)
> Somebody else will have to fix this, I've already wasted too much of my life
> making std:: filesystem (mostly) work on Windows.
Same here.
(In reply to Co
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111244
--- Comment #2 from Thiago Macieira ---
(In reply to Andrew Pinski from comment #1)
> Except the code page could be tuned via a manifest file even.
> For an example GCC embeds a manifest into its own compiler to work around
> this issue and just
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111244
Bug ID: 111244
Summary: std::filesystem::path encoding mismatches locale on
Windows
Product: gcc
Version: 13.2.1
Status: UNCONFIRMED
Severity: normal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=05
Bug ID: 05
Summary: [12/13/14 regression] __attribute__((malloc)) can no
longer name a C++ member function
Product: gcc
Version: 14.0
Status: UNCONFIRMED
S
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110591
Bug ID: 110591
Summary: [i386] (Maybe) Missed optimisation: _cmpccxadd sets
flags
Product: gcc
Version: 13.1.1
Status: UNCONFIRMED
Severity: normal
P
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110184
Bug ID: 110184
Summary: [i386] Missed optimisation: atomic operations should
use PF, ZF and SF
Product: gcc
Version: 13.1.1
Status: UNCONFIRMED
Severity: norma
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109896
--- Comment #7 from Thiago Macieira ---
(In reply to Jonathan Wakely from comment #6)
> With placement-new there's no allocation:
> https://gcc.godbolt.org/z/68e4PaeYz
Is the exception expected there, though?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109896
--- Comment #5 from Thiago Macieira ---
(In reply to Andrew Pinski from comment #4)
> If you are that picky for cycles, these cycles are not going to be a problem
> compared to the dynamic allocation that is just about to happen ..
Yeah, I
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109896
--- Comment #3 from Thiago Macieira ---
(In reply to H.J. Lu from comment #2)
> (In reply to Andrew Pinski from comment #1)
> > I suspect the overflow code was added before __builtin_*_overflow were added
> > which is why the generated code is t
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106409
--- Comment #8 from Thiago Macieira ---
(In reply to Andrew Pinski from comment #7)
> See PR 58525 also which added that code path.
That explains why it won't call __cxa_throw_bad_array_new_length, but not why
it will call operator new[](-1). M
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106409
--- Comment #6 from Thiago Macieira ---
Suggestion: add a function to libgcc to be called instead of
__cxa_throw_bad_array_new_length when exceptions are disabled. That function
can be a mere two instructions, but it provides two advantages:
* n
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109896
Bug ID: 109896
Summary: Missed optimisation: overflow detection in
multiplication instructions for operator new
Product: gcc
Version: 13.1.1
Status: UNCONFIRMED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109895
Bug ID: 109895
Summary: -Walloc-size-larger-than complains about code it
generated itself under -flto -fno-exceptions
Product: gcc
Version: 13.1.1
Status: UNCONFIRMED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99277
--- Comment #21 from Thiago Macieira ---
I understand that. I don't think it's a reason to repeat the policy, though.
Anyway, I don't have any new arguments than when we discussed this two years
ago, so I won't pursue this matter further.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99277
--- Comment #19 from Thiago Macieira ---
(In reply to Jonathan Wakely from comment #18)
> We have not committed to a stable ABI for C++20 yet.
That was my argument when creating this bug report two years ago: if it's
available in the standard he
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99277
--- Comment #17 from Thiago Macieira ---
(In reply to Thomas Rodgers from comment #16)
> The original implementation came from Olvier Giroux and is part of libc++.
> The libc++ implementation also does not use a type that futex or
> ulock_wait/wa
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99277
--- Comment #15 from Thiago Macieira ---
> > 5) std::barrier implementation also uses a type that futex(2) can't handle
> barrier still uses a 1-byte enum for the atomic waits.
That can only now be fixed for libstdc++.so.7, then.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99277
Thiago Macieira changed:
What|Removed |Added
Resolution|--- |FIXED
Status|UNCONFIRMED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108980
--- Comment #9 from Thiago Macieira ---
Ah, got it. That also explains why I couldn't find anything wrong with my code,
and nothing I did that could likely be it made the warning go away.
Thanks for the quick turnaround.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108980
--- Comment #7 from Thiago Macieira ---
The duplicate "note:" disappeared. But now there's no warning at all on the
same file, with the same options. Was that intended?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108980
--- Comment #6 from Thiago Macieira ---
Testing.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108980
--- Comment #1 from Thiago Macieira ---
GCC 13 (trunk) built today.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108980
Bug ID: 108980
Summary: Warning text missing the warning itself (GCC 13)
Product: gcc
Version: unknown
Status: UNCONFIRMED
Severity: normal
Priority: P3
Compone
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108372
Bug ID: 108372
Summary: [12 regression] -E -fdirectives-only crash
Product: gcc
Version: unknown
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: pr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98112
--- Comment #9 from Thiago Macieira ---
I can't be certain for other architectures' performance, but my feeling is that
indeed they would benefit from this. The option that was added as an -m should
be an -f (and match Clang's option).
However,
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108216
--- Comment #3 from Thiago Macieira ---
In bug 70644, the pointer to Base was passed to Base's constructor, so the
conversion from the derived type to the virtual base Base happened clearly
before said base was constructed.
In this example here
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104475
--- Comment #19 from Thiago Macieira ---
(In reply to Richard Biener from comment #15)
> Thanks, it's still the same reason - we isolate a nullptr case and end up
> with
>
> __atomic_or_fetch_4 (184B, 64, 0); [tail call]
>
> The path we isolat
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104475
--- Comment #14 from Thiago Macieira ---
Created attachment 54015
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54015&action=edit
qfutureinterface.cpp preprocessed [gcc trunk-20221205]
(In reply to Richard Biener from comment #13)
> Ther
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107456
--- Comment #4 from Thiago Macieira ---
(In reply to Thiago Macieira from comment #3)
> With the Remote Atomic Operations (RAO) of AAND, AOR and AXOR, we can do
> something.
Correcting myself: the RAO instructions don't give us the result back
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107456
Thiago Macieira changed:
What|Removed |Added
CC||thiago at kde dot org
--- Comment #3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106395
Bug ID: 106395
Summary: [10/11 regression] [mingw] "redeclared without
dllimport attribute: previous dllimport ignored" on
C++ friend
Product: gcc
Version: 12.1.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77306
Thiago Macieira changed:
What|Removed |Added
CC||thiago at kde dot org
--- Comment #3 f
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106023
Thiago Macieira changed:
What|Removed |Added
Status|UNCONFIRMED |RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106023
Bug ID: 106023
Summary: Would like to control the ELF visibility of template
explicit instantiations
Product: gcc
Version: 13.0
Status: UNCONFIRMED
Severity: n
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105348
--- Comment #4 from Thiago Macieira ---
One more Qt workaround, for the record:
https://codereview.qt-project.org/c/qt/qtbase/+/413730
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105509
Bug ID: 105509
Summary: [compatibility] f16 suffix not supported in C++ mode -
unable to find numeric literal operator
‘operator""f16’
Product: gcc
Version: 12.0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105348
--- Comment #3 from Thiago Macieira ---
I understand. I'm just trying to avoid having to add code for a corner-case.
People don't usually parse empty buffers, so it's usually fine to allow it to
proceed and discover an EOF condition.
Anyway, wo
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105348
--- Comment #1 from Thiago Macieira ---
Qt workaround: https://codereview.qt-project.org/c/qt/qtbase/+/407217
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105348
Bug ID: 105348
Summary: Overly aggressive -Warray-bounds after conditional
Product: gcc
Version: 12.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Componen
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103069
--- Comment #20 from Thiago Macieira ---
I think there will be cases where the relaxation makes sense and others where
it doesn't because the surrounding code already does it. So I'd like to control
per emission.
If I can't do it per code block
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103069
--- Comment #18 from Thiago Macieira ---
(In reply to Jakub Jelinek from comment #17)
> _Pragma("GCC target \"relax-cmpxchg-loop\"")
> should do that (ditto target("relax-cmpxchg-loop") attribute).
The attribute is applied to a function. I'm ho
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103069
--- Comment #16 from Thiago Macieira ---
Can this option be enabled and disabled with a _Pragma?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103069
--- Comment #14 from Thiago Macieira ---
I'd restrict relaxations to loops emitted by the compiler. All other atomic
operations shouldn't be modified at all, unless the user asks for it. That
includes non-looping atomic operations (like LOCK BTC
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104492
Bug ID: 104492
Summary: Bogus dangling pointer warning (dangling pointer to
‘candidates’ may be used [-Werror=dangling-pointer=])
Product: gcc
Version: 12.0
Status: UNCO
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104475
Bug ID: 104475
Summary: Wstringop-overflow + atomics incorrect warning on
dynamic object
Product: gcc
Version: 12.0
Status: UNCONFIRMED
Severity: normal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104243
--- Comment #7 from Thiago Macieira ---
(In reply to Martin Liška from comment #6)
> Anyway, upstream removed the pure attribute as we suggested:
> https://codereview.qt-project.org/c/qt/qtbase/+/392357
Can we be assured the pure attribute will
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104250
Bug ID: 104250
Summary: [i386] GCC may want to use 32-bit (I)DIV if it can for
64-bit operands
Product: gcc
Version: 12.0
Status: UNCONFIRMED
Severity: normal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103069
--- Comment #10 from Thiago Macieira ---
(In reply to H.J. Lu from comment #9)
> nptl/nptl_setxid.c in glibc has
>
> do
> {
> flags = THREAD_GETMEM (self, cancelhandling);
> newval = THREAD_ATOMIC_CMPXCHG_VAL (self, cance
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49001
--- Comment #7 from Thiago Macieira ---
Hack to workaround:
asm(
".macro vmovapd args:vararg\n"
"vmovupd \\args\n"
".endm\n"
".macro vmovaps args:vararg\n"
"vmovups \\args\n"
".endm\n"
".macro vmovdqa args:var
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103774
Thiago Macieira changed:
What|Removed |Added
CC||hjl.tools at gmail dot com
--- Commen
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103774
Bug ID: 103774
Summary: [i386] GCC should swap the arguments to certain
functions to generate a single instruction
Product: gcc
Version: 12.0
Status: UNCONFIRMED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103750
--- Comment #8 from Thiago Macieira ---
Update again: looks like the issue was the next line I didn't paste, which was
performing _kortestz_mask32_u8 on an __mmask16. The type mismatch was causing
this problem.
If I Use the correct _kortestz_ma
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103750
--- Comment #7 from Thiago Macieira ---
I should add the same is not happening for Char == char, meaning the returned
type is an __mmask32 (unsigned)
vmovdqu8(%rsi), %ymm2
vmovdqu832(%rsi), %ymm3
vpcmpub
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103750
--- Comment #6 from Thiago Macieira ---
It got worse. Now I'm seeing:
.L807:
vmovdqu16 (%rsi), %ymm2
vmovdqu16 32(%rsi), %ymm3
vpcmpuw $6, %ymm0, %ymm2, %k2
vpcmpuw $6, %ymm0, %ymm3, %k3
kmovw
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103750
--- Comment #5 from Thiago Macieira ---
Maybe this is running afoul of GCC's thinking that a simple register-register
move is free? I've seen it save a constant in an opmask register, but kmov{d,q}
is not free like mov{l,q} is.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103750
Bug ID: 103750
Summary: [i386] GCC schedules KMOV instructions that destroys
performance in loop
Product: gcc
Version: 12.0
Status: UNCONFIRMED
Severity: norma
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103066
--- Comment #10 from Thiago Macieira ---
You're right that emitting more penalises those who have done their job and
written proper code.
The problem we're seeing is that such code appears to be the minority. Or,
maybe put differently, the bad
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103090
--- Comment #1 from Thiago Macieira ---
One more:
bool tsign3(std::atomic &i)
{
// any two or more bits, so long as the sign bit is one of them
// (or the compiler doesn't know what's in the variable)
int bits = 1 | signbit;
r
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103069
--- Comment #2 from Thiago Macieira ---
See also bug 103090 for a few more (restricted) possibilities to replace a
cmpxchg loop with a LOCK RMW operation.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566
--- Comment #29 from Thiago Macieira ---
New suggestion in bug 103090
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103090
Bug ID: 103090
Summary: [i386] GCC should use the SF and ZF flags in some
atomic_fetch_op sequences
Product: gcc
Version: 12.0
Status: UNCONFIRMED
Severity: no
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103069
--- Comment #1 from Thiago Macieira ---
(the assembly doesn't match the source code, but we got your point)
Another possible improvement for the __atomic_fetch_{and,nand,or} functions is
that it can check whether the fetched value is already co
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101583
Thiago Macieira changed:
What|Removed |Added
CC||thiago at kde dot org
--- Comment #10
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566
--- Comment #26 from Thiago Macieira ---
(In reply to H.J. Lu from comment #25)
> Can you get some performance improvement data on real workloads?
Will ask.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566
--- Comment #24 from Thiago Macieira ---
(In reply to H.J. Lu from comment #23)
> I renamed the commit title. The new v3 is the v6 + fixes.
Got it. Still no issues.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566
--- Comment #22 from Thiago Macieira ---
(In reply to H.J. Lu from comment #21)
> Created attachment 51559 [details]
> The new v3 patch
>
> The new v3 patch to check invalid mask.
v3? We were already up to v6.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566
--- Comment #20 from Thiago Macieira ---
And:
$ cat /tmp/test.cpp
#include
bool tbit(std::atomic &i)
{
return i.fetch_xor(CONSTANT, std::memory_order_relaxed) & (CONSTANT);
}
$ ~/dev/gcc/bin/gcc "-DCONSTANT=(1LL<<63)" -S -o - -O2 /tmp/test.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566
--- Comment #19 from Thiago Macieira ---
(In reply to H.J. Lu from comment #17)
> Created attachment 51558 [details]
> The v6 patch
>
> Please try this.
Confirmed for all inputs.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566
--- Comment #15 from Thiago Macieira ---
Works now for the failing case. Additionally:
bool tbit(std::atomic &i)
{
return i.fetch_and(~CONSTANT, std::memory_order_relaxed) & (CONSTANT);
}
Will properly produce LOCK BTR (CONSTANT=2):
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566
--- Comment #12 from Thiago Macieira ---
Commit 7e0c0500808d58bca5b8e23cbd474022c32234e4 + your patch.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566
--- Comment #11 from Thiago Macieira ---
$ for ((i=0;i<32;++i)); do ~/dev/gcc/bin/gcc "-DCONSTANT=(1<<$i)" -S -o - -O2
/tmp/test.cpp | grep bts; done
lock btsl $0, (%rdi)
lock btsl $1, (%rdi)
lock btsl
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566
--- Comment #9 from Thiago Macieira ---
Looks like it doesn't work for the sign bit.
$ cat /tmp/test.cpp
#include
bool tbit(std::atomic &i)
{
return i.fetch_or(CONSTANT, std::memory_order_relaxed) & CONSTANT;
}
$ ~/dev/gcc/bin/gcc -DCONST
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566
--- Comment #8 from Thiago Macieira ---
$ cat /tmp/test.cpp
#include
bool tbit(std::atomic &i)
{
return i.fetch_or(1, std::memory_order_relaxed) & 1;
}
$ ~/dev/gcc/bin/gcc -S -o - -O2 /tmp/test.cpp
.file "test.cpp"
.text
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566
--- Comment #7 from Thiago Macieira ---
(In reply to H.J. Lu from comment #5)
> Created attachment 51536 [details]
> A patch
>
> Please try this.
Give me an hour (will try v2).
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566
Bug ID: 102566
Summary: [i386] GCC should emit LOCK BTS for simple
bit-test-and-set operations with std::atomic
Product: gcc
Version: unknown
Status: UNCONFIRMED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102166
--- Comment #9 from Thiago Macieira ---
> clang defines them as intrinsic because they support AMX register allocation
> (a lot of effort), gcc does not support AMX register allocation for now, and
> defining them as intrinsic + builtin doesn't
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102166
--- Comment #6 from Thiago Macieira ---
> I suggest doing as Clang did and make it an intrinsic.
Or even a __builtin_ia32_markamxtile(); intrinsic, which produces the error if
misused and does add the necessary bits to the .note.gnu.property se
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102166
--- Comment #5 from Thiago Macieira ---
(In reply to Hongtao.liu from comment #4)
> Because _tile_loadd is implemented as embedded assembly plus macros, if
> __AMX_TILE__ is removed, no error will be reported if the user does not use
> the -mamx
1 - 100 of 124 matches
Mail list logo