Jeff Ross wrote:
On Sat, 30 Jan 2010, Bob Beck wrote:
Ooooh. nice one. Obviously ami couldn't get memory mappings and
freaked out.
While not completely necessary, I'd love for you to file that whole
thing into sendbug() in a pr so we don't
forget it. but that one I need to pester krw, art, dlg, and maybe
marco about what ami is doing.
note that the behaviour you see wrt free memory dropping but not
hitting swap is what I expect.
basically that makes the buffer cache subordinate to working set
memory between 10 and 90% of
physmem. the buffer cache will throw away pages before allowing the
system to swap.
Drop it back to 70% and tell me if you still get the same panic
please. and if you have a fixed test
case that reproduces this on your machine ( a load generator for
postgres with clients) I'd love to
have a copy in the pr as well.
70% produces the same panic.
panic: pmap_enter: no pv entries available
Stopped at Debugger+0x4: leave
RUN AT LEAST 'trace' AND 'ps' AND INCLUDE OUTPUT WHEN REPORTING THIS PANIC!
IF RUNNING SMP, USE 'mach ddbcpu <#>' AND 'trace' ON OTHER PROCESSORS, TOO.
DO NOT EVEN BOTHER REPORTING THIS WITHOUT INCLUDING THAT INFORMATION!
ddb{0}> trace
Debugger(7149,dfe3ede8,d08edf18,c,0) at Debugger+0x4
panic(d077d740,0,7000,d08ad6e0,ffffffff) at panic+0x55
pmap_enter(d08f3520,e0031000,7149000,3,13) at pmap_enter+0x2e5
_bus_dmamem_map(d0875c40,d505fc44,1,6344,d505fc58,1,dfe3eebc,1) at
_bus_dmamem_map+0x9c
ami_allocmem(d49b5800,6344,20,d0753ddc) at ami_allocmem+0x92
ami_mgmt(d49b5800,a1,4,0,0,6344,d49cd000,1) at ami_mgmt+0x268
ami_refresh_sensors(d49b5800,da987028,da987050,80000000,da987028) at
ami_refresh_sensors+0x25
sensor_task_work(d49b3d80,0,50,200286) at sensor_task_work+0x1f
workq_thread(d0863100) at workq_thread+0x32
Bad frame pointer: 0xd0a32e78
ddb{0}>
I'll skip the ps for this go round since it should be pretty much the
same and go directly to sendbug, including the pg_bench script I use to
trigger it.
Thanks!
Jeff
Okay, I've been testing. I brought everything up to current, applied the
ami.c patch sent by David Gwynne as modified by Phillip Guenther, and the
patch to bus_dma.c sent by Kenneth Westerback.
I started by setting kern.bufcachepercent=60 and then moving down by 10 after
each panic. Anything 20 or greater triggers the same panic as above.
I then set it to 10 to see what would happen. The load ran okay, but I did
get three uvm_mapent_alloc: out of static map entries entries into the console
that seems to coincide with the end of one of the three pgbench runs and the
start of the next.
So I set it to 11 and got this:
ddb{2}> show panic
malloc: out of space in kmem_map
ddb{2}> trace
Debugger(3fff,c,d488a000,4,4000) at Debugger+0x4
panic(d0752c20,0,4000,0,0) at panic+0x55
malloc(4000,7f,0,da8980b4,70000) at malloc+0x76
vfs_getcwd_scandir(e002eee8,e002eecc,e002eed0,d4e80400,da24f184) at vfs_getcwd_
scandir+0x123
vfs_getcwd_common(da898154,0,e002ef10,d4e80400,200,1,da24f184,22) at vfs_getcwd
_common+0x1f0
sys___getcwd(da24f184,e002ef68,e002ef58,da24f184) at sys___getcwd+0x62
syscall() at syscall+0x12b
--- syscall (number 304) ---
0x1c028f25:
I reported a similar panic back in December
http://kerneltrap.org/mailarchive/openbsd-misc/2009/12/14/6309363
and was told I'd twisted the knobs too hard ;-)
Here are the sysctl values I'm currently using:
kern.maxproc=10240
kern.maxfiles=20480
kern.maxvnodes=6000
kern.shminfo.shmseg=32
kern.seminfo.semmni=256
kern.seminfo.semmns=2048
kern.shminfo.shmall=512000
kern.shminfo.shmmax=768000000
About that time Owain Ainsworth sent his version of a fix to bus_dma.c so I
applied that and built a new kernel and I still get panics when I adjust
kern.bufcachepercent above 15 or so.
Here's the latest panic, trace and ps with kern.bufcachepercent set to 20:
ddb{0}> show panic
pmap_enter: no pv entries available
ddb{0}> trace
Debugger(b6f30,dff02e1c,2000,c,0) at Debugger+0x4
panic(d077d7a0,0,0,d0e7a980,1) at panic+0x55
pmap_enter(d08f3520,e002d000,b6f30000,7,13) at pmap_enter+0x2e5
uvm_km_alloc1(d08ad720,2000,0,1) at uvm_km_alloc1+0xd5
fork1(da8c5834,14,1,0,0) at fork1+0x100
sys_fork(da8c5834,dff02f68,dff02f58,da8c5834) at sys_fork+0x38
syscall() at syscall+0x12b
--- syscall (number 2) ---
ddb{0}> ps
PID PPID PGRP UID S FLAGS WAIT COMMAND
18032 3768 18032 503 3 0x2000008 flt_pmfail2 postgres
28815 3768 28815 503 3 0x2000008 inode postgres
16262 3768 16262 503 2 0x2000008 postgres
15301 3768 15301 503 3 0x2000008 inode postgres
16712 3768 16712 503 2 0x2000008 postgres
5959 3768 5959 503 3 0x2000008 flt_pmfail2 postgres
24166 3768 24166 503 3 0x2000008 flt_pmfail2 postgres
23692 3768 23692 503 3 0x2000008 flt_pmfail2 postgres
21841 3768 21841 503 3 0x2000008 flt_pmfail2 postgres
23838 3768 23838 503 3 0x2000008 flt_pmfail2 postgres
25423 3768 25423 503 3 0x2000008 inode postgres
5075 3768 5075 503 3 0x2000008 inode postgres
23008 3768 23008 503 3 0x2000008 inode postgres
10527 3768 10527 503 3 0x2000008 flt_pmfail2 postgres
17391 3768 17391 503 3 0x2000008 flt_pmfail2 postgres
27363 3768 27363 503 3 0x2000008 flt_pmfail2 postgres
4858 3768 4858 503 2 0x2000008 postgres
18716 3768 18716 503 2 0x2000008 postgres
8073 3768 8073 503 2 0x2000008 postgres
30893 3768 30893 503 3 0x2000008 flt_pmfail2 postgres
13741 3768 13741 503 3 0x2000008 flt_pmfail2 postgres
14272 3768 14272 503 2 0x2000008 postgres
1962 3768 1962 503 3 0x2000008 inode postgres
4988 3768 4988 503 3 0x2000008 flt_pmfail2 postgres
8452 3768 8452 503 3 0x2000008 inode postgres
15633 3768 15633 503 3 0x2000008 flt_pmfail2 postgres
17648 3768 17648 503 3 0x2000008 flt_pmfail2 postgres
27751 3768 27751 503 3 0x2000008 inode postgres
12932 3768 12932 503 7 0x2000008 postgres
21367 3768 21367 503 2 0x2000008 postgres
13583 3768 13583 503 3 0x2000008 flt_pmfail2 postgres
11813 3768 11813 503 3 0x2000008 inode postgres
6094 3768 6094 503 3 0x2000008 inode postgres
13686 3768 13686 503 3 0x2000008 flt_pmfail2 postgres
10011 3768 10011 503 3 0x2000008 flt_pmfail2 postgres
30652 3768 30652 503 7 0x2000008 postgres
5358 3768 5358 503 3 0x2000008 flt_pmfail2 postgres
24385 3768 24385 503 3 0x2000008 inode postgres
29773 3768 29773 503 3 0x2000008 flt_pmfail2 postgres
22599 3768 22599 503 3 0x2000008 flt_pmfail2 postgres
14231 3768 14231 503 3 0x2000008 flt_pmfail2 postgres
14083 3768 14083 503 3 0x2000008 flt_pmfail2 postgres
9729 3768 9729 503 2 0x2000008 postgres
11748 3768 11748 503 3 0x2000008 inode postgres
9097 3768 9097 503 3 0x2000008 flt_pmfail2 postgres
16057 3768 16057 503 3 0x2000008 inode postgres
1067 3768 1067 503 3 0x2000008 flt_pmfail2 postgres
27445 3768 27445 503 3 0x2000008 flt_pmfail2 postgres
24037 3768 24037 503 3 0x2000008 flt_pmfail2 postgres
7000 3768 7000 503 2 0x2000008 postgres
29367 3768 29367 503 3 0x2000008 inode postgres
25020 3768 25020 503 3 0x2000008 flt_pmfail2 postgres
20564 3768 20564 503 3 0x2000008 flt_pmfail2 postgres
4090 3768 4090 503 3 0x2000008 flt_pmfail2 postgres
8865 3768 8865 503 3 0x2000008 flt_pmfail2 postgres
29631 3768 29631 503 3 0x2000008 inode postgres
16278 3768 16278 503 7 0x2000008 postgres
20854 3768 20854 503 2 0x2000008 postgres
31943 3768 31943 503 3 0x2000008 flt_pmfail2 postgres
18137 3768 18137 503 2 0x2000008 postgres
10523 4345 10523 0 3 0x2044182 poll systat
4345 1 4345 0 3 0x2004082 pause ksh
2520 1 2520 0 3 0x2004082 ttyin getty
32389 1 32389 0 3 0x2004082 ttyin getty
21553 1 21553 0 3 0x2000080 select cron
22285 1 22285 0 3 0x2000080 select sshd
19843 1 19843 0 3 0x2000180 select inetd
24948 27412 17184 83 3 0x2000180 poll ntpd
27412 17184 17184 83 3 0x2000180 poll ntpd
17184 1 17184 0 3 0x2000080 poll ntpd
7640 25372 25372 74 3 0x2000180 bpf pflogd
25372 1 25372 0 3 0x2000080 netio pflogd
25643 12522 12522 73 3 0x2000180 poll syslogd
12522 1 12522 0 3 0x2000088 netio syslogd
31132 3768 31132 503 3 0x2000088 poll postgres
6731 3768 6731 503 3 0x2000088 select postgres
3796 3768 3796 503 2 0x8 postgres
10178 3768 10178 503 3 0x2000088 select postgres
23616 3768 23616 503 3 0x2000088 select postgres
1109 22145 22399 7794 3 0x2004082 piperd qmail-clean
4633 22145 22399 7795 3 0x2004082 select qmail-rspawn
26222 22145 22399 0 3 0x2004082 select qmail-lspawn
* 3768 27937 22399 503 7 0x200400a postgres
3493 26189 22399 73 3 0x2004082 piperd multilog
19747 10749 22399 7792 3 0x2004082 piperd multilog
31915 18366 22399 1002 3 0x2004082 piperd multilog
611 19270 22399 73 3 0x2004182 netio socklog
6710 1 6710 0 3 0x2004082 ttyin getty
30305 1 30305 0 3 0x2004082 ttyin getty
1774 1 1774 0 3 0x2004082 ttyin getty
25968 31940 22399 1001 3 0x2004182 poll dnscache
30323 3543 22399 73 3 0x2004082 piperd multilog
22145 15017 22399 7796 3 0x2004082 select qmail-send
26189 30365 22399 0 3 0x2004082 poll supervise
27937 30365 22399 0 3 0x2004082 poll supervise
3543 30365 22399 0 3 0x2004082 poll supervise
19270 30365 22399 0 3 0x2004082 poll supervise
10749 30365 22399 0 3 0x2004082 poll supervise
15017 30365 22399 0 3 0x2004082 poll supervise
18366 30365 22399 0 3 0x2004082 poll supervise
31940 30365 22399 0 3 0x2004082 poll supervise
30671 22399 22399 0 3 0x2004082 piperd readproctitle
30365 22399 22399 0 3 0x2004082 nanosleep svscan
22399 1 22399 0 3 0x2004082 pause sh
19 0 0 0 3 0x2100200 bored crypto
18 0 0 0 3 0x2100200 aiodoned aiodoned
17 0 0 0 3 0x2100200 syncer update
16 0 0 0 3 0x2100200 cleaner cleaner
15 0 0 0 3 0x100200 reaper reaper
14 0 0 0 2 0x2100200 pagedaemon
13 0 0 0 3 0x2100200 pftm pfpurge
12 0 0 0 3 0x2100200 usbevt usb2
11 0 0 0 3 0x2100200 usbevt usb1
10 0 0 0 3 0x2100200 usbtsk usbtask
9 0 0 0 3 0x2100200 usbevt usb0
8 0 0 0 3 0x2100200 acpi_idle acpi0
7 0 0 0 3 0x40100200 idle3
6 0 0 0 3 0x40100200 idle2
5 0 0 0 3 0x40100200 idle1
4 0 0 0 3 0x2100200 bored syswq
3 0 0 0 3 0x40100200 idle0
2 0 0 0 3 0x2100200 kmalloc kmthread
1 0 1 0 3 0x2004080 wait init
0 -1 0 0 3 0x2080200 scheduler swapper
Anybody got an idea what I can test from here?
Thanks,
Jeff