I know this sounds like it might be better answered in gcc-help, but if I am right this is a bug report.
I'm using gcc 4.5 branch, rev. 165881 (a week old), on x86-64 Linux. This testcase is derived from a larger program. I have looked at the assembly and was puzzled. #include <set> #include <stdio.h> int main() { static const int array[] = { 1,2,3,4,5,6,7,8,9,10,6 }; std::set<int> the_set; int count = 0; for (unsigned i = 0; i < sizeof(array)/sizeof(*array); i++) { std::pair<std::set<int>::iterator, bool> result = the_set.insert(array[i]); if (result.second) count++; } printf("%d unique items in array.\n", count); return 0; } compiled using g++ -std=c++98 -Os this produces what looks to me as very inefficient code. Particularly this loop in main(): 40076d: 89 d8 mov %ebx,%eax 40076f: 4c 89 e7 mov %r12,%rdi 400772: 48 8d 34 85 60 0a 40 lea 0x400a60(,%rax,4),%rsi 400779: 00 40077a: e8 b1 01 00 00 callq 400930 <std::set<int, std::less<int>, std::allocator<int> >::insert(int const&)> 40077f: 48 89 04 24 mov %rax,(%rsp) 400783: 89 54 24 08 mov %edx,0x8(%rsp) 400787: 48 89 44 24 40 mov %rax,0x40(%rsp) 40078c: 48 8b 44 24 08 mov 0x8(%rsp),%rax 400791: 3c 01 cmp $0x1,%al 400793: 48 89 44 24 48 mov %rax,0x48(%rsp) 400798: 83 dd ff sbb $0xffffffffffffffff,%ebp 40079b: ff c3 inc %ebx 40079d: 83 fb 0b cmp $0xb,%ebx 4007a0: 75 cb jne 40076d <main+0x19> And the uninlined set::insert(): 0000000000400930 <std::set<int, std::less<int>, std::allocator<int> >::insert(int const&)>: 400930: 48 83 ec 48 sub $0x48,%rsp 400934: e8 5f ff ff ff callq 400898 <std::_Rb_tree<int, int, std::_Identity<int>, std::less<int>, std::allocator<int> >::_M_insert_unique(int const&)> 400939: 89 54 24 18 mov %edx,0x18(%rsp) 40093d: 8a 54 24 18 mov 0x18(%rsp),%dl 400941: 88 54 24 28 mov %dl,0x28(%rsp) 400945: 8b 54 24 28 mov 0x28(%rsp),%edx 400949: 48 83 c4 48 add $0x48,%rsp 40094d: c3 retq 40094e: 90 nop 40094f: 90 nop In the larger program, I got almost these exact results using "-fprofile-use -O3", but since I have replicated it using -Os and without PGO maybe it will be easier to debug/optimize. That zero-result stack shuffling, and the unused stack frames are strange. Am I reading the code wrong? Should I be using a different version of the compiler? Is this a known bug? Please advise.