In a digression from 2.95, gcc 4.1 copies a packed bitfield structure member by member instead of doing as an aggregate.
Given the program: struct __attribute__((packed)) A { unsigned short i : 1, l : 1, j : 3, k : 11; }; struct A sA; struct A retmeA (struct A x) { return x; } GCC 4.1 -O3 compiles that as: .align 1 .globl retmeA .type retmeA, @function retmeA: .word 0x0 subl2 $4,%sp movl %r1,%r0 movw 4(%ap),%r2 rotl $30,%r2,%r3 bicl2 $-8,%r3 rotl $31,%r2,%r4 bicl2 $-2,%r4 rotl $27,%r2,%r1 bicl2 $-2048,%r1 insv %r1,$5,$11,(%r0) insv %r3,$2,$3,(%r0) insv %r4,$1,$1,(%r0) insv %r2,$0,$1,(%r0) ret .size retmeA, .-retmeA Whereas GCC 2.95.3 compiled it to: .globl retmeA .type retmeA,@function retmeA: .word 0x0 movw 4(%ap),(%r1) ret -- Summary: Packed bitfield structure copies are very inefficient Product: gcc Version: 4.1.0 Status: UNCONFIRMED Severity: normal Priority: P2 Component: regression AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: matt at 3am-software dot com CC: gcc-bugs at gcc dot gnu dot org GCC build triplet: x86_64--netbsd GCC host triplet: x86_64--netbsd GCC target triplet: vax--netbsdelf http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21198