https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105875
Andrew Pinski <pinskia at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |NEW Component|tree-optimization |c Ever confirmed|0 |1 Last reconfirmed| |2023-05-17 --- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> --- It is the front-end that is producing the worse code: TARGET_EXPR <D.2818, 1>; TARGET_EXPR <D.2819, (void) (D.2819 = VIEW_CONVERT_EXPR<atomic_bool>(__atomic_load_1 ((const volatile void *) &b, 5)))>; <D.2821>:; TARGET_EXPR <D.2820, ((int) TARGET_EXPR <D.2819, (void) (D.2819 = VIEW_CONVERT_EXPR<atomic_bool>(__atomic_load_1 ((const volatile void *) &b, 5)))> ^ D.2818) != 0>; if (__atomic_compare_exchange_1 ((volatile void *) &b, (void *) &D.2819, (int) VIEW_CONVERT_EXPR<unsigned char>(D.2820), 0, 5, 5)) { goto <D.2822>; } goto <D.2821>; <D.2822>:;, D.2820; vs: TARGET_EXPR <D.2827, (char) __atomic_xor_fetch_1 ((volatile void *) &c, (int) (unsigned char) TARGET_EXPR <D.2826, 1>, 5)>;, D.2827; So confirmed. Using __atomic_xor_fetch_1 directly works. That is: __atomic_xor_fetch_1 (&b, 1, 5); Produces: lock xorb $1, b(%rip)