I reran the benchmark after the latest changes (r380995). Since I ran the tests on Solaris I decided to extend the set of implementations compared in this benchmark to also include the Sun libraries (both the native Rogue Wave C++ Standard Library, version 2.1.1, as well as STLport 4.5.2 bundled with the compiler and optionally available under the -library=stlport4 switch).
All test were compiled at -O2 and without thread safety (i.e., no -mt or similar flag). The results are below. In each row the value 1. indicates the best result (the fastest runtime measured by the user time of the process) and represents a baseline against all other timings were calculated. +---------------+-------------------------------------------------+ | | Sun C++ 5.7 | GCC 4.0.2 | | function +---------+---------+---------+---------+---------+ | | stdcxx | native | stlport | stdcxx | native | +===============+=========+=========+=========+=========+=========+ | default ctor | 1.11 | 1.34 | 3.55 | 1. | 1.22 | | char* ctor | 1. | 1.83 | 2.36 | 1.05 | 1.67 | | string ctor | 1.03 | 1.20 | 1.50 | 1. | 1.08 | | sputn | 1. | 1.36 | 2.56 | 1.10 | 1.78 | | insert char | 1.11 | 1.18 | 2.09 | 1. | 1.23 | | insert char* | 1. | 1.30 | 2.03 | 1.09 | 1.51 | | insert string | 1.74 | 1.57 | 2.19 | 1.64 | 1. | +---------------+---------+---------+---------+---------+---------+ Martin Sebor wrote:
Martin Sebor wrote: [...]I also ran some simple benchmarks: latest gcc stdcxx 4.0.2 +---------------+-------+-------+ | default ctor | 1.00 | 1.22 | | char* ctor | 1.00 | 1.59 | | string ctor | 1.00 | 1.05 | | insert char | 1.00 | .96 | | insert char* | 1.00 | .56 | | insert string | 1.00 | .53 | | sputn | 1.00 | .47 | +---------------+-------+-------+ Clearly there is still some room for improvement. I tweaked the allocation policy used by stringbuf to double the size of the buffer (rather than growing by a factor of 1.6 or so) but that didn't make any difference (which should have been expected). The last number is particularly puzzling because, AFAICT, xsputn() (called by sputn) is optimal. I don't see a significant opportunity for optimization there.Okay, I now see that it's not quite optimal and understand why. Our implementation uses the generic streambuf::xsputn() which copies the string into the buffer one chunk at a time, calling overflow() to process the contents of the buffer each time it runs out of space. This is optimal for filebuf (which flushes the buffer and starts writing from the beginning) but less so for stringbuf which must reallocate the buffer and copy its contents to the new one every time. This can be optimized by allocating the necessary amount of space ahead of time and simply copying the string into it in one shot. With this optimization in place the new numbers are: +---------------+-------+-------+ | default ctor | 1.00 | 1.22 | | char* ctor | 1.00 | 1.58 | | string ctor | 1.00 | 1.06 | | insert char | 1.00 | 1.23 | | insert char* | 1.00 | 1.39 | | insert string | 1.00 | .60 | | sputn | 1.00 | 1.62 | +---------------+-------+-------+ The string inserter still needs to be optimized but everything else is looking much better. Martin
