[Vserver] fatal errors starting and stopping a guest

Chuck Tue, 04 Oct 2005 11:51:02 -0700

this one is really odd. 

im using gentoo with
kernel 2.6.13.1-vs2.1.0-rc2
        from vanilla kernel and vserver patch
util-vserver-0.30.208-r3
baselayout-vserver-1.12.0_pre8
        for some reason i could never get the 1.11.13-r1


the machine is a dual processor p3-500 machine with intel mobo.

i have been having stop/start problems with this installation from the start. 
the single proc system has exactly the same setups and it has no problems.

initially it had problems when it came to mount/unmount.. even on a successful 
stop randomly i would get an error mounting in fstab, so usually going in and 
removing the contents of mtab cured it.  when i enabled mounting the shared 
portage directories in the etc/vserver/fstab, it would always fail on stop 
not able to unmount file systems. i suspect it would try unmounting distfiles 
after portage was unmounted thereby making distfiles invalid.


it seems that  i have no trouble starting/stopping guests within a few min 
after a host reboot, but after some time then the error occurs.. the last 
time i tried an experiment and went into each guest and did an orderly manual 
shut down of each running service, then exited and stopped them from the 
host. the first one errored out with the messages below.. at that point none 
of them shut down cleanly. and when i went to reboot the host using reboot, 
it did a power down instead!!!  i just made sure acpi is off in the kernel 
because i could not restart the host at that point by a remote power cycle.

another thing weird is the /etc/init.d/vserver script is not auto-starting the 
guests.. it tries because the ip addresses are still listed in each nic but 
no trace of the guests is present in the process list. and when i start them 
manually, they start with no other error than the expected RTNETLINK file 
exists nonfatal error.


the first error i received was on the first guest to be shut down:

==============================================
apollo rio # vserver prometheus stop
/usr/lib/util-vserver/vserver.functions: line 804: 28440 Segmentation fault \   
  $_NOHUP $_VWAIT --timeout "$VSHELPER_SYNC_TIMEOUT" --terminate \
--status-fd 3 "$2" >>$_is_tmpdir/out 2>$_is_tmpdir/err 3>$_is_tmpdir/fifo
internal error: 'vwait' exited with an unexpected status ''; I will
try to continue but be prepared for unexpected events.
* Prometheus Stopped
=================================================

btw, the start and stopped messages are my additions in the 
pre-start/post-stop scripts.


then, when i tried to start it again i got:

==================================================
apollo ~ # vserver prometheus start
* Prometheus starting
<1>Unable to handle kernel NULL pointer dereference at virtual address 
00000000
 printing eip:
c0136b7e
*pde = 00000000
Oops: 0002 [#9]
SMP
Modules linked in:
CPU:    1
EIP:    0060:[<c0136b7e>]    Not tainted VLI
EFLAGS: 00010246   (2.6.13.1-vs2.1.0-rc2)
EIP is at __dealloc_vx_info+0xe/0x50
eax: 00000000   ebx: 00000d4d   ecx: 00000000   edx: f5807000
esi: ffffffef   edi: f5807000   ebp: eb0da000   esp: eb0dbf68
ds: 007b   es: 007b   ss: 0068
Process vcontext (pid: 31199, threadinfo=eb0da000 task=f5adb040)
Stack: eb0da000 c0136d52 f5807000 00000d4d 00000000 fffffeff c01375b8 00000d4d
       c1923380 00000000 00000000 00000003 00000000 00000d4d c0136423 00000d4d
       00000000 09010001 0804bcd4 bfa70704 c0102ff9 09010001 00000d4d 00000000
Call Trace:
 [<c0136d52>] __create_vx_info+0x92/0x1c0
 [<c01375b8>] vc_ctx_create+0x98/0x100
 [<c0136423>] sys_vserver+0x163/0x540
 [<c0102ff9>] syscall_call+0x7/0xb
Code: 4d c2 83 f8 01 89 c1 7e 9e e9 bb fd ff ff 0f bc c0 e9 9f fd ff ff 8d b4 
26 00 00 00 00 83 ec 04 8b 54 24 08 8b 02 8b 4a 04 85 c0 <89> 01 74 03 89 48 
04 81 4a 18 00 80 00 00 c7 42 04 00 02 20 00
 /usr/lib/util-vserver/vserver.start: line 147: 31199 Segmentation fault      
[EMAIL PROTECTED] $_CHBIND "[EMAIL PROTECTED]" -- $_EXEC_ULIMIT 
"$VSERVER_DIR"/ulimits $_VCONTEXT --create "[EMAIL PROTECTED]" -- 
${USE_VNAMESPACE:+$_VNAMESPACE --set -- } $_VLIMIT --dir 
"$VSERVER_DIR"/rlimits --missingok -- $_VSCHED --xid self "[EMAIL PROTECTED]" 
-- $_VUNAME --xid self --dir "$VSERVER_DIR"/uts --missingok -- 
"[EMAIL PROTECTED]" $_VUNAME --xid self --set -t 
context="$VSERVER_DIR" -- $_VATTRIBUTE --set "[EMAIL PROTECTED]" -- 
$_SAVE_CTXINFO "$VSERVER_DIR" $_ENV -i -- $_VCONTEXT --migrate-self 
--endsetup --chroot $SILENT_OPT "[EMAIL PROTECTED]" 
"[EMAIL PROTECTED]" -- "[EMAIL PROTECTED]"

An error occured while executing the vserver startup sequence; when
there are no other messages, it is very likely that the init-script
(/sbin/init) failed.

Common causes are:
* /etc/rc.d/rc on Fedora Core 1 and RH9 fails always; the 'apt-rpm' build
  method knows how to deal with this, but on existing installations,
  appending 'true' to this file will help.


Failed to start vserver 'prometheus'

====================================================

then, when i tried to shut the host down, i got this:


====================================================

kernel BUG at kernel/vserver/context.c:144!
invalid operand: 0000 [#10]
SMP
Modules linked in:
CPU:    0
EIP:    0060:[<c0136cb0>]    Not tainted VLI
EFLAGS: 00010246   (2.6.13.1-vs2.1.0-rc2)
EIP is at free_vx_info+0x70/0x80
eax: 00000001   ebx: f4a0e938   ecx: da1e0368   edx: f58e7000
esi: f58e7000   edi: c03d89a4   ebp: da1e030c   esp: eeb63da4
ds: 007b   es: 007b   ss: 0068
Process find (pid: 3033, threadinfo=eeb62000 task=f58fe530)
Stack: c013c584 f58e7000 f4a0e938 00000020 00000004 00000000 c1907960 000000d0
       fffffff4 da1e030c f4af18a4 eeb63e4c c0175181 f4af18a4 da1e030c eeb63f10
       00000000 eeb63f10 eeb63e44 eeb63e4c c017557a f48d2d90 eeb63e4c eeb63f10
Call Trace:
 [<c013c584>] proc_virtual_lookup+0xd4/0x2a0
 [<c0175181>] real_lookup+0xd1/0x100
 [<c017557a>] do_lookup+0x13a/0x150
 [<c0175cf7>] __link_path_walk+0x767/0xe70
 [<c0146ca7>] filemap_nopage+0x207/0x3c0
 [<c0176449>] link_path_walk+0x49/0xe0
 [<c01767a4>] path_lookup+0x94/0x170
 [<c0176a43>] __user_walk+0x33/0x60
 [<c0170a5c>] vfs_lstat+0x1c/0x60
 [<c01711eb>] sys_lstat64+0x1b/0x40
 [<c01155e0>] do_page_fault+0x0/0x5db
 [<c0102ff9>] syscall_call+0x7/0xb
Code: ce b0 3b c0 eb dc f6 42 18 01 74 cf 0f 0b 95 00 ce b0 3b c0 eb c5 0f 0b 
93 00 ce b0 3b c0 eb b7 0f 0b 92 00 ce b0 3b c0 eb a6 90 <0f> 0b 90 00 ce b0 
3b c0 eb 94 8d b6 00 00 00 00 57 56 53 83 ec
  * Hiding /proc entries ...                               

apollo ~ #                                                                      

============================================

and it sat forever at hiding proc entries

i finally got pissed at it and logged back into it on another terminal and 
issued init 0 which i found out called a halt rather than a shutdown which it 
has done in the past... i suppose i should have done an init 6 once again.. 
that may have been the shut down when i initially told it to reboot. :)


now iget something really odd and only when i am starting this one guest 
prometheus.... it shows me the startup process!:)


========================================
      apollo ~ # vserver prometheus start
* Prometheus starting
INIT: version 2.86 booting

Gentoo Linux; http://www.gentoo.org/
 Copyright 1999-2005 Gentoo Foundation; Distributed under the GPLv2

 * Setting hostname to prometheus ...                                 [ ok ]
 * Updating environment ...                                             [ ok ]
 * Cleaning /var/lock, /var/run ...                                    [ ok ]
 * Cleaning /tmp directory ...                                          [ ok ]
 * Setting DNS domainname to sbbsnet.net                       [ ok ]
INIT: Entering runlevel: 3
 * Starting clamd ...                                                  [ ok ]
 * Starting freshclam ...                                              [ ok ]
                                                                        [ ok ]
 * Starting syslog-ng ...                                           [ ok ]
 * Starting service scan ...                                        [ ok ]
 * Starting spamd ...                                          [ ok ]
 * Starting local ...                                               [ ok ]
INIT: no more processes left in this runlevel

============================================

at this point it does not return to the host prompt unless i press enter


and when i  stop it i now see the shutdown sequences but got no error.

=============================================
apollo ~ # vserver prometheus stop
INIT: Sending processes the TERM signal
 * Stopping local ...                                             [ ok ]
 * Stopping spamd ...                                          [ ok ]
 * Stopping service scan ...                             [ ok ]
 * Stopping services ...                                           [ ok ]
 * Stopping service logging ...                             [ ok ]
 * Stopping syslog-ng ...                                     [ ok ]
 * Stopping clamd ...
* Failed to stop clamd                  [ !! ]
 * Stopping freshclam ...                                       [ ok ]
* Prometheus Stopped
apollo ~ #

==============================================

i do not see these sequences on other guests and they only became visible 
after this last super crash and reboot.

i am hoping all these problems will go away when i set everything up fresh on 
the big machine...


any clues what is happening? 

if its that kernel 'race bug' concerning smp , do you think the kernel.org 
people will have it fixed in a few weeks? are they even aware of it? im 
getting a bit apprehensive because this final machine being installed in 
about 2 weeks must be absolutely perfect the first time. no room for errors  
on that one.


-- 

Chuck

"...and the hordes of M$*ft users descended upon me in their anger,
and asked 'Why do you not get the viruses or the BlueScreensOfDeath
or insecure system troubles and slowness or pay through the nose 
for an OS as *we* do?!!', and I answered...'I use Linux'. "
The Book of John, chapter 1, page 1, and end of book


_______________________________________________
Vserver mailing list
Vserver@list.linux-vserver.org
http://list.linux-vserver.org/mailman/listinfo/vserver

[Vserver] fatal errors starting and stopping a guest

Reply via email to