Part of the problem is that these allocations are very small:
# dtrace -n 'pid$target::malloc:entry { @a["allocsz"] = quantize(arg0); }' -c
/tmp/xml
allocsz
value ------------- Distribution ------------- count
1 | 0
2 | 300000
4 |@@@@@ 4700005
8 |@@ 1600006
16 |@@@@@ 4300015
32 |@@@@@@@@@@@@@@@@@@@@@@@@@@@ 24000006
64 | 200001
128 | 400001
256 | 100000
512 | 0
1024 | 100000
2048 | 100000
4096 | 0
8192 | 100000
16384 | 0
After seeing this, I took a look at the exact breakdown of the
allocation sizes:
# dtrace -n 'pid$target::malloc:entry {...@a[arg0] = count();}' -c /tmp/xml
12 1
96 1
200 1
21 100000
43 100000
44 100000
51 100000
61 100000
75 100000
88 100000
128 100000
147 100000
181 100000
220 100000
440 100000
1024 100000
2048 100000
8194 100000
8 100001
52 100001
6 100002
36 100004
24 100005
33 200000
4 200001
17 200001
9 200003
3 300000
10 300000
13 300000
14 300000
25 300000
28 400000
11 400001
20 700009
40 900000
5 900001
16 2500000
7 3500001
48 3800001
60 18500000
The most frequent malloc call is to allocate 60 bytes. I believe that
we have a known issue with small mallocs on Solaris. There's a bug open
for this somewhere; however, I can't find it's number at the moment.
Another problem that you may have run into is the 32-bit versus 64-bit
compilation problem. I was able to shave about 10 seconds off my
runtime by compiling your testcase as a 64-bit app instead of a 32-bit
one:
$ gcc -O3 -o xml `/usr/bin/xml2-config --libs --cflags` xml.c
$ file xml
xml: ELF 32-bit LSB executable 80386 Version 1 [FPU], dynamically
linked, not stripped, no debugging information available
$ ./xml
100000 iter in 22.749836 sec
versus:
$ gcc -m64 -O3 -o xml `/usr/bin/xml2-config --libs --cflags` xml.c
$ file xml
xml: ELF 64-bit LSB executable AMD64 Version 1, dynamically linked,
not stripped, no debugging information available
$ ./xml
100000 iter in 13.785916 sec
-j
On Wed, Apr 30, 2008 at 06:44:31PM -0400, Matty wrote:
> On Wed, Apr 30, 2008 at 6:26 PM, David Lutz <David.Lutz at sun.com> wrote:
> > If your application is single threaded, you could try using the
> > bsdmalloc library. This is a fast malloc, but it is not multi-thread
> > safe and will also tend to use more memory than the default
> > malloc. For a comparison of different malloc libraries, look
> > at the NOTES section at the end of umem_alloc(3MALLOC).
> >
> > I got the following result with your example code:
> >
> >
> > $ gcc -O3 -o xml `/usr/bin/xml2-config --libs --cflags` xml.c
> > $ ./xml
> > 100000 iter in 21.445672 sec
> > $
> > $ gcc -O3 -o xml `/usr/bin/xml2-config --libs --cflags` xml.c -lbsdmalloc
> > $ ./xml
> > 100000 iter in 12.761969 sec
> > $
> >
> > I got similar results using Sun Studio 12.
> >
> > Again, bsdmalloc is not multi-thread safe, so use it with caution.
>
> Thanks David. Does anyone happen to know why the memory allocation
> libraries in Solaris are so much slower than their Linux counterparts? If
> the various malloc implementations were a second or two slower, I could
> understand. But they appear to be 10 - 12 seconds slower in our specific
> test case, which seems kinda odd.
>
> Thanks,
> - Ryan
> _______________________________________________
> perf-discuss mailing list
> perf-discuss at opensolaris.org