On Wed, 14 Nov 2012, Chris Rees wrote:
On 14 Nov 2012 18:49, "Konstantin Belousov" <kostik...@gmail.com> wrote:
On Wed, Nov 14, 2012 at 09:28:23AM -0800, David O'Brien wrote:
On Thu, Oct 25, 2012 at 11:18:06PM +0000, Simon J. Gerraty wrote:
Log:
Merge bmake-20121010
Hi Simon,
I was kicking the tires on this and noticed bmake is dynamically linked.
Can you change it to being statically linked?
This issue most recently came up in freebsd-current. See thread pieces
http://lists.freebsd.org/pipermail/freebsd-current/2012-April/033460.html
http://lists.freebsd.org/pipermail/freebsd-current/2012-April/033472.html
http://lists.freebsd.org/pipermail/freebsd-current/2012-April/033473.html
As you see, I prefer to not introduce new statically linked binaries into
base.
If, for unfortunate turns of events, bmake is changed to be statically
linked,
please obey WITH_SHARED_TOOLCHAIN.
Or a /rescue/bmake for when speed is a concern would also be acceptable.
Yes, the big rescue executable is probably even better than dynamic linkage
for pessimizing speeds. Sizes on freefall now:
% text data bss dec hex filename
% 130265 1988 9992 142245 22ba5 /bin/sh
% 5256762 133964 2220464 7611190 742336 /rescue/sh
% -r--r--r-- 1 root wheel 3738610 Nov 11 06:48 /usr/lib/libc.a
The dynamically-linked /bin/sh is deceptively small, although it is larger
than the statically linked /bin/sh in FreeBSD-1 for few new features.
When executed, it expands to 16.5MB with 10MB RSS. I don't know how much
of that is malloc bloat that wouldn't need to be copied on fork, but it
is a lot just to map. /rescue/sh starts at 5MB and expands to 15.5MB with
9.25MB when executed. So it is slightly smaller, and its slowness is
determined by its non-locality. Perhaps its non-locality is not as good
for pessimization as libc's.
I don't use dynamic linkage of course. /bin/sh is bloated by static
linkage (or rather libc) in the FreeBSD-~5.2 that I usually run:
text data bss dec hex filename
649623 8192 64056 721871 b03cf /bin/sh
but this "only" expands to 864K with 580K RSS when executed. This can be
forked a little faster than 10MB RSS. In practice the timings for
time whatever/sh -c 'for i in $(jot 1000 1); do echo -n; done'
are:
freefall /bin/sh: 6.93 real 1.69 user 5.16 sys
freefall /rescue/sh: 6.86 real 1.65 user 5.13 sys
local /bin/sh: 0.21 real 0.01 user 0.18 sys
freefall:
FreeBSD 10.0-CURRENT #4 r242881M: Sun Nov 11 05:30:05 UTC 2012
r...@freefall.freebsd.org:/usr/obj/usr/src/sys/FREEFALL amd64
CPU: Intel(R) Xeon(R) CPU X5650 @ 2.67GHz (2666.82-MHz K8-class CPU)
Origin = "GenuineIntel" Id = 0x206c2 Family = 0x6 Model = 0x2c Stepping =
2
local:
FreeBSD 5.2-CURRENT #4395: Sun Apr 8 12:15:03 EST 2012
b...@besplex.bde.org:/c/obj/usr/src/sys/compile/BESPLEX.fw
...
CPU: AMD Athlon(tm) 64 Processor 3200+ (2010.05-MHz 686-class CPU)
Origin = "AuthenticAMD" Id = 0xf48 Stepping = 8
freefall may be pessimized by INVARIANTS. It is pessimized by /bin/echo
being dynamically linked. Normally shells use builtin echo so the speed
of /bin/echo is unimportant. There is also some strangeness in the timing
for /bin/echo specifically. Changing 'echo -n' to
'/bin/rm -f /etc/nonesuch' or /usr/bin/true reduces the times on freefall
by almost a factor of 2, although rm is larger and has to do more:
freefall:
text data bss dec hex filename
2661 540 8 3209 c89 /bin/echo
11026 884 152 12062 2f1e /bin/rm
1420 484 8 1912 778 /usr/bin/true
(all dynamically linked to libc only. truss verifies that rm does a little
more).
freefall /bin/sh echo: 6.93 real 1.69 user 5.16 sys
freefall /bin/sh rm: 3.83 real 0.91 user 2.84 sys
freefall /bin/sh true: 3.68 real 0.75 user 2.85 sys
freefall /rescue/sh echo: 6.86 real 1.65 user 5.13 sys
freefall /rescue/sh rm: 3.69 real 0.83 user 2.78 sys
freefall /rescue/sh true: 3.67 real 0.85 user 2.74 sys
local /bin/sh echo: 0.21 real 0.01 user 0.18 sys
local /bin/sh rm: 0.22 real 0.02 user 0.19 sys
local /bin/sh true: 0.18 real 0.01 user 0.17 sys
local:
text data bss dec hex filename
11926 60 768 12754 31d2 /bin/echo
380758 6752 61772 449282 6db02 /bin/rm
1639 40 604 2283 8eb /usr/bin/true
(all statically linked. I managed to debloat crtso and libc enough for
/usr/bin/true to be small. The sources for /bin/echo are excessively
optimized for space in the executable -- they have contortions to avoid
using printf. But this is useless in -current, since crtso and libc
drag in printf, so that the null program int main(){} has size:
freefall (amd64):
text data bss dec hex filename
316370 12156 55184 383710 5dade null-static
1452 484 8 1944 798 null-dynamic
local (i386):
text data bss dec hex filename
1490 40 604 2134 856 null-static
1203 208 32 1443 5a3 null-dynamic
Putting this null program in the jot loop gives a truer indication of the
cost of a statically linked shell:
freefall /bin/sh null-static: 6.36 real 1.51 user 4.45 sys
freefall /bin/sh null-dynamic: 3.92 real 0.85 user 2.71 sys
local /bin/sh null-static: 0.18 real 0.00 user 0.18 sys
local /bin/sh null-dynamic: 0.58 real 0.09 user 0.49 sys
The last 2 lines show the expected large cost of dynamic linkage for
a small program (3 times slower), but the freefall lines show strangeness
-- static linkage is almost twice as slow, and almost as slow as
/bin/echo -n. So to get a truer indication of the cost of a statically
linked shell, test with my favourite small program:
%%%
#include <sys/syscall.h>
.globl _start
_start:
movl $SYS_sync,%eax
int $0x80
pushl $0 # only to look like a sync library call (?)
pushl $0
movl $SYS_exit,%eax
int $0x80
%%%
This is my sync.S source file for sync(1) on x86 (must build on i386
using cc -o sync sync.S -nostdlib).
local:
text data bss dec hex filename
18 0 0 18 12 sync
It does the same amount of error checking as /usr/src/bin/sync.c (none),
which compiles to:
freefall:
text data bss dec hex filename
316330 12092 55184 383606 5da76 sync-static
1503 492 8 2003 7d3 sync-dynamic
Putting this in the jot loop gives:
local /bin/sh sync: 0.65 real 0.01 user 0.63 sys
but since is a heavyweight instruction and I don't want to exercise
freefalls's disks, remove the syscall from the program, so it just
does _exit(0):
text data bss dec hex filename
11 0 0 18 12 syncfree-sync
freefall /bin/sh syncfree-sync: 0.29 real 0.01 user 0.11 sys
local /bin/sh syncfree-sync: 0.17 real 0.00 user 0.17 sys
This shows that most of freefall's enormous slowness is for execing
its bloated executables, perhaps especially when they are on nfs
(oops). Another test of null-static after copying it to /tmp shows
that nfs makes little difference. However, syncfree-sync is much
faster when copied to /tmp (<= 0.08 seconds real. Test not done, but
this result is read off from a later test).
Next, try bloating syncfree-sync with padding to the same size as
null-static:
%%%
#include <sys/syscall.h>
.text
.globl _start
_start:
pushl $0
pushl $0
movl $SYS_exit,%eax
int $0x80
.space 316370-11
.data
.space 12156
.bss
.space 55184
%%%
text data bss dec hex filename
316370 12156 55184 383710 5dade bloated-syncfree-sync
freefall /bin/sh bloated-syncfree-sync: 0.08 real 0.00 user 0.08 sys (zfs)
freefall /bin/sh bloated-syncfree-sync: 0.30 real 0.00 user 0.13 sys (nfs)
local /bin/sh bloated-syncfree-sync: 0.21 real 0.00 user 0.21 sys (ffs)
This shows that the the kernel is still quite fast and enormous slowness
on freefall is mainly in crtso. I blame malloc() for this. malloc()
first increases the size of a null statically linked program from ~1K
text to 310K text. Then it increases the startup time by a factor of
50 or so. For small utilities like echo and rm, the increases are
similar. A small utility only needs to allocate about 8K of data (for
stdio buffers). Since execing bloated-syncfree-sync is fast, a small
utility could do this allocation a few thousand times in the time that
crtso now takes to start up (the 300+K of padding only gives enough for
statically allocating 40 x 8K. Expanding the padding by a factor of
50 might slow down the exec to the crtso time, but gives 2000 x 8K.
Of course, actually using the allocated areas will slow down both the
statically allocated and the dynamically allocated cases a lot.
More tests with a large program on small data (put 'cc -c null.c' in
the jot loop, where null.c is int main(){}):
freefall /bin/sh clang: 22.53 real 6.35 user 12.15 sys (nfs)
freefall /bin/sh gcc: 35.28 real 13.14 user 17.45 sys (nfs)
local /bin/sh cc: 17.50 real 6.72 user 2.64 sys (ffs)
The crtso slowness seems to be very significant even here. Assume that
it is 6 seconds (divided by 1000) per exec. clang is monolithic and
does only 1 exec per cc -c. gcc is a small driver program that execs
cc1 and as (it used to exec a separate cpp too). So gcc does 3 execs
per cc -c, and 6 seconds extra for the 2 extra execs accounts almost
exactly for clang being 12.75 seconds faster.
The `local' time apparently shows a large accounting bug. Actually, it
is because I left a shell loop for testing this running in the background.
All the other 'local' times are not much affected by this, since the
background loop has low priority, and scheduling works so that it is
rarely run in competition with the tiny programs in the other tests.
But here the cc's compete with it significantly. After fixing this
and also running the freefall tests on zfs:
freefall /bin/sh clang: 19.69 real 6.74 user 12.82 sys (zfs)
freefall /bin/sh gcc: 28.51 real 12.75 user 15.47 sys (zfs, gcc-4.2.1)
local /bin/sh cc: 8.95 real 6.17 user 2.74 sys (ffs, gcc-3.3.3)
gcc-4.2.1 is only 35% slower than gcc-3.3.3 on larger source files when it
is run locally:
local /bin/sh gcc: 120.1 real 112.4 user 7.4 sys (ffs, gcc-3.3.3 -O1 -S)
local /bin/sh gcc: 164.6 real 155.8 user 8.1 sys (ffs, gcc-3.3.3 -O2 -S)
local /bin/sh gcc: 161.9 real 148.0 user 8.1 sys (ffs, gcc-4.2.1 -O1 -S)
local /bin/sh gcc: 202.4 real 193.6 user 8.0 sys (ffs, gcc-4.2.1 -O2 -S)
Maybe malloc() would be faster with MALLOC_PRODUCTION. I use
/etc/malloc.conf -> aj locally. freefall doesn't have /etc/malloc.conf.
MALLOC_OPTIONS no longer works, and MALLOC_CONF is too large for me to
understand, so I don't know how to turn off non-production features
dynamically.
Bruce
_______________________________________________
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"