Yipes! The 32-vs-64-bit thing bites us a lot when people are doing
Linux-to-Solaris comparisons.
If you just say "gcc" on Linux on a 64-bit CPU, you get a 64-bit binary. On
Solaris,
the default is 32-bit (even on a 64-bit CPU), unless you use -m64. This was
done for
maximum portability (32-bit binaries run on both 32 and 64-bit systems), but it
frequently messes up benchmark comparisons, because we end up comparing
apples and oranges....
Mike
johansen at sun.com wrote:
> Part of the problem is that these allocations are very small:
>
> # dtrace -n 'pid$target::malloc:entry { @a["allocsz"] = quantize(arg0); }' -c
> /tmp/xml
>
> allocsz
> value ------------- Distribution ------------- count
> 1 | 0
> 2 | 300000
> 4 |@@@@@ 4700005
> 8 |@@ 1600006
> 16 |@@@@@ 4300015
> 32 |@@@@@@@@@@@@@@@@@@@@@@@@@@@ 24000006
> 64 | 200001
> 128 | 400001
> 256 | 100000
> 512 | 0
> 1024 | 100000
> 2048 | 100000
> 4096 | 0
> 8192 | 100000
> 16384 | 0
>
> After seeing this, I took a look at the exact breakdown of the
> allocation sizes:
>
> # dtrace -n 'pid$target::malloc:entry {...@a[arg0] = count();}' -c /tmp/xml
>
> 12 1
> 96 1
> 200 1
> 21 100000
> 43 100000
> 44 100000
> 51 100000
> 61 100000
> 75 100000
> 88 100000
> 128 100000
> 147 100000
> 181 100000
> 220 100000
> 440 100000
> 1024 100000
> 2048 100000
> 8194 100000
> 8 100001
> 52 100001
> 6 100002
> 36 100004
> 24 100005
> 33 200000
> 4 200001
> 17 200001
> 9 200003
> 3 300000
> 10 300000
> 13 300000
> 14 300000
> 25 300000
> 28 400000
> 11 400001
> 20 700009
> 40 900000
> 5 900001
> 16 2500000
> 7 3500001
> 48 3800001
> 60 18500000
>
> The most frequent malloc call is to allocate 60 bytes. I believe that
> we have a known issue with small mallocs on Solaris. There's a bug open
> for this somewhere; however, I can't find it's number at the moment.
>
> Another problem that you may have run into is the 32-bit versus 64-bit
> compilation problem. I was able to shave about 10 seconds off my
> runtime by compiling your testcase as a 64-bit app instead of a 32-bit
> one:
>
> $ gcc -O3 -o xml `/usr/bin/xml2-config --libs --cflags` xml.c
> $ file xml
> xml: ELF 32-bit LSB executable 80386 Version 1 [FPU], dynamically
> linked, not stripped, no debugging information available
> $ ./xml
> 100000 iter in 22.749836 sec
>
> versus:
>
> $ gcc -m64 -O3 -o xml `/usr/bin/xml2-config --libs --cflags` xml.c
> $ file xml
> xml: ELF 64-bit LSB executable AMD64 Version 1, dynamically
> linked, not stripped, no debugging information available
> $ ./xml
> 100000 iter in 13.785916 sec
>
>
> -j
>
>
> On Wed, Apr 30, 2008 at 06:44:31PM -0400, Matty wrote:
>> On Wed, Apr 30, 2008 at 6:26 PM, David Lutz <David.Lutz at sun.com> wrote:
>>> If your application is single threaded, you could try using the
>>> bsdmalloc library. This is a fast malloc, but it is not multi-thread
>>> safe and will also tend to use more memory than the default
>>> malloc. For a comparison of different malloc libraries, look
>>> at the NOTES section at the end of umem_alloc(3MALLOC).
>>>
>>> I got the following result with your example code:
>>>
>>>
>>> $ gcc -O3 -o xml `/usr/bin/xml2-config --libs --cflags` xml.c
>>> $ ./xml
>>> 100000 iter in 21.445672 sec
>>> $
>>> $ gcc -O3 -o xml `/usr/bin/xml2-config --libs --cflags` xml.c -lbsdmalloc
>>> $ ./xml
>>> 100000 iter in 12.761969 sec
>>> $
>>>
>>> I got similar results using Sun Studio 12.
>>>
>>> Again, bsdmalloc is not multi-thread safe, so use it with caution.
>> Thanks David. Does anyone happen to know why the memory allocation
>> libraries in Solaris are so much slower than their Linux counterparts? If
>> the various malloc implementations were a second or two slower, I could
>> understand. But they appear to be 10 - 12 seconds slower in our specific
>> test case, which seems kinda odd.
>>
>> Thanks,
>> - Ryan
>> _______________________________________________
>> perf-discuss mailing list
>> perf-discuss at opensolaris.org
> _______________________________________________
> perf-discuss mailing list
> perf-discuss at opensolaris.org