Hi,
I've a weird problem, sometimes a random file "stucks" after 1-2 weeks
uptime on a xfs partition within a vs. the xfs is laying on lvm2 &
dm-crypto, userspace is 32bit Debian, kernel is AMD64. I using vs since
2.6.14.4-2.1.x but I changed to 2.6.15.2-2.1.0.5.1 when this problem
appeared and I hoped that it will solve it - but unfortunately not..
today I got another "stucked" file, the process goes to state 'D' if I
try to stat or open it. another files in this directory are working
well.
It's a highly loaded file server and I never had same problem outside
the vs, so I think it isn't a generic xfs problem. I've noticed that
too, this lockup happens on mass rm'ing of files, but of course not on
every mass rm.. as I said, it's a really big file server with millions
of files, please think twice before you say "xfs? vs? it's works for me
at home!" :)
the details..
dmesg is empty on the host, no panic or same.
this kernel process is stucked on the host:
root 15062 0.4 0.0 00 ?DMar06 6:15 [pdflush]
vs:/# uname -a
Linux vs 2.6.15.2-vs2.1.0.5.1 #4 SMP Tue Feb 14 18:15:09 CET 2006 x86_64
GNU/Linux
vs:/# uptime
18:26:21 up 9 days, 8:36, 2 users, load average: 30.5, 30.5, 30.2
- it's not "real" load, I've kicked out the users, no network traffic
vs:/# cat /proc/mounts
rootfs / rootfs rw 0 0
/dev/root / ext3 rw,data=ordered 0 0
/dev/mapper/blv /mirror/pub xfs rw,nosuid,nodev,noexec 0 0
/dev/mapper/bvlv /mirror/pub/fsn xfs rw,nosuid,nodev,noexec 0 0
none /proc proc rw,nodiratime,nodev 0 0
none /tmp tmpfs rw,nodev 0 0
none /dev/pts devpts rw 0 0
/dev2/root2 /bin ext3 ro,nodev,data=ordered 0 0
/dev2/root2 /sbin ext3 ro,nodev,data=ordered 0 0
/dev2/root2 /lib ext3 ro,nodev,data=ordered 0 0
/dev2/root2 /lib/modules ext3 ro,nodev,data=ordered 0 0
/dev2/root2 /usr ext3 ro,nodev,data=ordered 0 0
/dev2/root2 /usr/local ext3 ro,nodev,data=ordered 0 0
/dev2/root2 /etc/terminfo ext3 ro,nodev,data=ordered 0 0
/dev2/root2 /etc/alternatives ext3 ro,nodev,data=ordered 0 0
vs:/# ps axu(too long to paste here, cutted)
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.0 1516 464 ?SFeb26 0:01 init [3]
root 11733 0.0 0.0 1848 540 ?SN Feb26 0:01
/usr/sbin/inetutils-inetd -R 120
root 11757 0.0 0.0 3368 516 ?SNs Feb26 0:00 /usr/sbin/sshd
root 11799 0.0 0.0 1764 588 ?SNs Feb26 0:00 /usr/sbin/cron
#497 11763 0.0 0.0 4128 960 ?DN 09:20 0:31 ftpd
#492 19892 0.0 0.0 4140 960 ?DN 09:31 0:06 ftpd
#497 26474 0.0 0.0 4008 928 ?DN 09:39 0:00 ftpd
#492 28038 0.0 0.0 4128 956 ?DN 09:40 0:00 ftpd
#497 2061 0.0 0.0 4008 928 ?DN 09:48 0:00 ftpd
#497 2405 0.0 0.0 4008 928 ?DN 09:49 0:00 ftpd
#497 3524 0.0 0.0 4128 956 ?DN 09:50 0:00 ftpd
root 16748 0.0 0.0 1912 508 ?DN 11:30 0:00
/opt/bin/myglinsert 3 bla bla
root 25866 0.0 0.0 1916 508 ?DN 13:30 0:00
/opt/bin/myglinsert 3 bla bla
...
vs:/# ls -l /proc/19892/fd/
lrwx-- 1 root root 64 Mar 7 18:32 0 -> socket:[33836916]
lrwx-- 1 root root 64 Mar 7 18:32 1 -> socket:[33836916]
l-wx-- 1 root root 64 Mar 7 18:32 2 -> /dev/null
lrwx-- 1 root root 64 Mar 7 18:32 6 -> /mirror/pub/lacee/site
lrwx-- 1 root root 64 Mar 7 18:32 7 -> socket:[33850456]
vs:/# ls -l /proc/2405/fd/
lrwx-- 1 root root 64 Mar 7 18:34 0 -> socket:[33884876]
lrwx-- 1 root root 64 Mar 7 18:34 1 -> socket:[33884876]
l-wx-- 1 root root 64 Mar 7 18:34 2 -> /dev/null
lr-x-- 1 root root 64 Mar 7 18:34 6 -> /mirror/pub/lacee/site
..the rest is same..
vs:/# stat /mirror/pub/lacee/site
File: `/mirror/pub/lacee/site'
Size: 4096Blocks: 16 IO Block: 4096 directory
Device: fd14h/64788dInode: 1613995234 Links: 3
Access: (0777/drwxrwxrwx) Uid: ( 497/ UNKNOWN) Gid: ( 9500/ UNKNOWN)
Access: 2006-03-07 18:30:07.769923805 +0100
Modify: 2006-03-07 09:36:30.683606795 +0100
Change: 2006-03-07 09:36:30.683606795 +0100
vs:/# cd /mirror/pub/lacee/site
vs:site# ls | wc -l
84
vs:site# for i in *; do echo "$i"; stat "$i"; done
...
bc-me2bo.r49
[oops, 'stat' locked on this file!]
SysRQ+t - very long, copied only the stucked processes, but not all of
them is here, cause the 'dmesg' buffer is too small and I haven't a
serial console :(
kernel: myglinsertD 81003438c000 0 16748 16740
kernel: 810013347ce8 0086 0292
kernel:0292 0008
81007b708260
kernel:0292 880e3596
kernel: Call Trace:{:xfs:xfs_iunlock+102}
{__d_lookup+159}
kernel:{__down_read+129}
{:xfs:xfs_getattr+65}
kernel:{:xfs:vn_revalidate+59}
{link_path_walk+415}
kernel:{:xfs:linvfs_getattr+36}