Re: [Vserver] stucked file on xfs (x86_64)

2006-03-11 Thread Pallai Roland

On Wed, 2006-03-08 at 14:29 +0100, Herbert Poetzl wrote:
> On Tue, Mar 07, 2006 at 07:30:26PM +0100, Pallai Roland wrote:
> > 
> >  I've a weird problem, sometimes a random file "stucks" after 1-2 weeks
> > uptime on a xfs partition within a vs. the xfs is laying on lvm2 &
> > 
> > SysRQ+t - very long, copied only the stucked processes, but not all of
> > them is here, cause the 'dmesg' buffer is too small and I haven't a
> > serial console :(
 today I've got another stucked file, and this time I've a full SysRq+t
dump of the system, ask if that could help

 in the last mail I've mentioned that only the 'pdflush' is in state D
on the host, but I was wrong, 'xfssyncd' also stucked in D. both of them
is in the new sysrq+t dump

 I'm not a kernel hacker, but I've noticed that every time is exactly
one process of the stucked ones that doing a dm_request() meanwhile the
pdflush is stucked on dm_* thing too, I think, maybe it's not a dangling
inode lock in the xfs code, but some kind of deadlock in dm*-things..?
don't laugh, I said, INAKH! :)

pdflush   D  0  9262 11  9326  9367 
(L-TLB)^@
81005b743aa8 0046 0003 8100761f0ed0 ^@
     0096 0003 ^@
   81005b743a18 8012de13 ^@
Call Trace:{__wake_up+67} 
{dm_table_unplug_all+51}^@
   {dm_unplug_all+29} {sync_page+0}^@
   {io_schedule+52} {sync_page+72}^@
   {__wait_on_bit_lock+65} 
{__lock_page+164}^@
   {wake_bit_function+0} 
{wake_bit_function+0}^@
   {pagevec_lookup_tag+26} 
{mpage_writepages+351}^@
   {:xfs:linvfs_writepage+0} 
{__sync_single_inode+112}
   {__writeback_single_inode+417} 
{dm_table_any_congest
   {dm_any_congested+72} 
{dm_table_any_congested+71}^@
   {sync_sb_inodes+482} 
{keventd_create_kthread+0}^@
   {writeback_inodes+133} 
{wb_kupdate+206}^@
   {pdflush+0} {__pdflush+292}^@
   {pdflush+58} {wb_kupdate+0}^@
   {kthread+146} {child_rip+8}^@
   {keventd_create_kthread+0} 
{kthread+0}^@
   {child_rip+0} ^@

glftpdD 81006f666000 0 18518  1 18526 24134 
(NOTLB)^@
81006ddfba28 0086 0292 80355fda ^@
   81007ff82e00 0001 8100422b2140 80227256 ^@
   0001 c20d9040 ^@
Call Trace:{dm_request+122} 
{generic_make_request+262}^@
   {__down+152} 
{default_wake_function+0}^@
   {__down_failed+53} 
{:xfs:.text.lock.xfs_buf+25}^@
   {:xfs:_pagebuf_find+372} 
{:xfs:xfs_buf_get_flags+82}
   {:xfs:xfs_buf_read_flags+26} 
{:xfs:xfs_trans_read_bu
   {:xfs:xfs_alloc_read_agf+108} 
{:xfs:_xfs_trans_commi
   {:xfs:xfs_alloc_fix_freelist+291}^@
   {:xfs:xfs_trans_log_inode+39} 
{__down_read+18}^@
   {:xfs:xfs_free_extent+152} 
{:xfs:xfs_efd_init+68}^@
   {:xfs:xfs_trans_get_efd+43} 
{:xfs:xfs_bmap_finish+23
   {:xfs:xfs_itruncate_finish+420} 
{:xfs:xfs_log_reserv
   {:xfs:xfs_inactive+558} 
{:xfs:linvfs_clear_inode+161
   {clear_inode+224} 
{generic_delete_inode+205}^@
   {iput+123} {sys_unlink+259}^@
   {ia32_sysret+0} ^@


> looks like some xfs inode lock is not released properly
> the reasons for this can be various, updating to the
> latest kernel and vserver patches might help here ...
> 
> anyway, will have a more detailed look at it later.
 thanks in advice, I trying different kernels meanwhile, now I rebooted
into 2.6.15.6-vs2.1.1-rc10


--
 d

___
Vserver mailing list
Vserver@list.linux-vserver.org
http://list.linux-vserver.org/mailman/listinfo/vserver


[Vserver] stucked file on xfs (x86_64)

2006-03-07 Thread Pallai Roland

Hi,

 I've a weird problem, sometimes a random file "stucks" after 1-2 weeks
uptime on a xfs partition within a vs. the xfs is laying on lvm2 &
dm-crypto, userspace is 32bit Debian, kernel is AMD64. I using vs since
2.6.14.4-2.1.x but I changed to 2.6.15.2-2.1.0.5.1 when this problem
appeared and I hoped that it will solve it - but unfortunately not..
today I got another "stucked" file, the process goes to state 'D' if I
try to stat or open it. another files in this directory are working
well.
 It's a highly loaded file server and I never had same problem outside
the vs, so I think it isn't a generic xfs problem. I've noticed that
too, this lockup happens on mass rm'ing of files, but of course not on
every mass rm.. as I said, it's a really big file server with millions
of files, please think twice before you say "xfs? vs? it's works for me
at home!" :)


the details..
dmesg is empty on the host, no panic or same.

this kernel process is stucked on the host:
 root 15062  0.4  0.0 00 ?DMar06   6:15 [pdflush]


vs:/# uname -a 
Linux vs 2.6.15.2-vs2.1.0.5.1 #4 SMP Tue Feb 14 18:15:09 CET 2006 x86_64 
GNU/Linux
vs:/# uptime
 18:26:21 up 9 days,  8:36,  2 users,  load average: 30.5, 30.5, 30.2

 - it's not "real" load, I've kicked out the users, no network traffic

vs:/# cat /proc/mounts 
rootfs / rootfs rw 0 0
/dev/root / ext3 rw,data=ordered 0 0
/dev/mapper/blv /mirror/pub xfs rw,nosuid,nodev,noexec 0 0
/dev/mapper/bvlv /mirror/pub/fsn xfs rw,nosuid,nodev,noexec 0 0
none /proc proc rw,nodiratime,nodev 0 0
none /tmp tmpfs rw,nodev 0 0
none /dev/pts devpts rw 0 0
/dev2/root2 /bin ext3 ro,nodev,data=ordered 0 0
/dev2/root2 /sbin ext3 ro,nodev,data=ordered 0 0
/dev2/root2 /lib ext3 ro,nodev,data=ordered 0 0
/dev2/root2 /lib/modules ext3 ro,nodev,data=ordered 0 0
/dev2/root2 /usr ext3 ro,nodev,data=ordered 0 0
/dev2/root2 /usr/local ext3 ro,nodev,data=ordered 0 0
/dev2/root2 /etc/terminfo ext3 ro,nodev,data=ordered 0 0
/dev2/root2 /etc/alternatives ext3 ro,nodev,data=ordered 0 0


vs:/# ps axu(too long to paste here, cutted)

USER   PID %CPU %MEM   VSZ  RSS TTY  STAT START   TIME COMMAND
root 1  0.0  0.0  1516  464 ?SFeb26   0:01 init [3] 
root 11733  0.0  0.0  1848  540 ?SN   Feb26   0:01 
/usr/sbin/inetutils-inetd -R 120
root 11757  0.0  0.0  3368  516 ?SNs  Feb26   0:00 /usr/sbin/sshd
root 11799  0.0  0.0  1764  588 ?SNs  Feb26   0:00 /usr/sbin/cron
#497 11763  0.0  0.0  4128  960 ?DN   09:20   0:31 ftpd
#492 19892  0.0  0.0  4140  960 ?DN   09:31   0:06 ftpd
#497 26474  0.0  0.0  4008  928 ?DN   09:39   0:00 ftpd
#492 28038  0.0  0.0  4128  956 ?DN   09:40   0:00 ftpd
#497  2061  0.0  0.0  4008  928 ?DN   09:48   0:00 ftpd
#497  2405  0.0  0.0  4008  928 ?DN   09:49   0:00 ftpd
#497  3524  0.0  0.0  4128  956 ?DN   09:50   0:00 ftpd
root 16748  0.0  0.0  1912  508 ?DN   11:30   0:00 
/opt/bin/myglinsert 3 bla bla
root 25866  0.0  0.0  1916  508 ?DN   13:30   0:00 
/opt/bin/myglinsert 3 bla bla
...

vs:/# ls -l /proc/19892/fd/
lrwx--  1 root root 64 Mar  7 18:32 0 -> socket:[33836916]
lrwx--  1 root root 64 Mar  7 18:32 1 -> socket:[33836916]
l-wx--  1 root root 64 Mar  7 18:32 2 -> /dev/null
lrwx--  1 root root 64 Mar  7 18:32 6 -> /mirror/pub/lacee/site
lrwx--  1 root root 64 Mar  7 18:32 7 -> socket:[33850456]

vs:/# ls -l /proc/2405/fd/
lrwx--  1 root root 64 Mar  7 18:34 0 -> socket:[33884876]
lrwx--  1 root root 64 Mar  7 18:34 1 -> socket:[33884876]
l-wx--  1 root root 64 Mar  7 18:34 2 -> /dev/null
lr-x--  1 root root 64 Mar  7 18:34 6 -> /mirror/pub/lacee/site

..the rest is same..


vs:/# stat /mirror/pub/lacee/site
  File: `/mirror/pub/lacee/site'
  Size: 4096Blocks: 16 IO Block: 4096   directory
Device: fd14h/64788dInode: 1613995234  Links: 3
Access: (0777/drwxrwxrwx)  Uid: (  497/ UNKNOWN)   Gid: ( 9500/ UNKNOWN)
Access: 2006-03-07 18:30:07.769923805 +0100
Modify: 2006-03-07 09:36:30.683606795 +0100
Change: 2006-03-07 09:36:30.683606795 +0100

vs:/# cd /mirror/pub/lacee/site
vs:site# ls | wc -l
84
vs:site# for i in *; do echo "$i"; stat "$i"; done
...
bc-me2bo.r49
[oops, 'stat' locked on this file!]


SysRQ+t - very long, copied only the stucked processes, but not all of
them is here, cause the 'dmesg' buffer is too small and I haven't a
serial console :(

kernel: myglinsertD 81003438c000 0 16748  16740
kernel: 810013347ce8 0086  0292
kernel:0292  0008 
81007b708260
kernel:0292 880e3596
kernel: Call Trace:{:xfs:xfs_iunlock+102} 
{__d_lookup+159}
kernel:{__down_read+129} 
{:xfs:xfs_getattr+65}
kernel:{:xfs:vn_revalidate+59} 
{link_path_walk+415}
kernel:{:xfs:linvfs_getattr+36} 

[Vserver] sendile() fix for 2.1.0-rc11 on 2.6.14.4

2005-12-20 Thread Pallai Roland

Hi,

 vserver-2.1.0-rc11 has broken sendfile on 2.6.14.4 due to a pointless
check in do_sendfile(). here's the fix:

--- linux/fs/read_write.c~  2005-12-20 23:43:22.0 +0100
+++ linux/fs/read_write.c   2005-12-20 23:47:22.0 +0100
@@ -776,9 +776,6 @@
current->syscr++;
current->syscw++;
 
-   if (*ppos > max)
-   retval = -EOVERFLOW;
-
 fput_out:
fput_light(out_file, fput_needed_out);
 fput_in:



--
 dap


___
Vserver mailing list
Vserver@list.linux-vserver.org
http://list.linux-vserver.org/mailman/listinfo/vserver