this one is really odd. im using gentoo with kernel 2.6.13.1-vs2.1.0-rc2 from vanilla kernel and vserver patch util-vserver-0.30.208-r3 baselayout-vserver-1.12.0_pre8 for some reason i could never get the 1.11.13-r1
the machine is a dual processor p3-500 machine with intel mobo. i have been having stop/start problems with this installation from the start. the single proc system has exactly the same setups and it has no problems. initially it had problems when it came to mount/unmount.. even on a successful stop randomly i would get an error mounting in fstab, so usually going in and removing the contents of mtab cured it. when i enabled mounting the shared portage directories in the etc/vserver/fstab, it would always fail on stop not able to unmount file systems. i suspect it would try unmounting distfiles after portage was unmounted thereby making distfiles invalid. it seems that i have no trouble starting/stopping guests within a few min after a host reboot, but after some time then the error occurs.. the last time i tried an experiment and went into each guest and did an orderly manual shut down of each running service, then exited and stopped them from the host. the first one errored out with the messages below.. at that point none of them shut down cleanly. and when i went to reboot the host using reboot, it did a power down instead!!! i just made sure acpi is off in the kernel because i could not restart the host at that point by a remote power cycle. another thing weird is the /etc/init.d/vserver script is not auto-starting the guests.. it tries because the ip addresses are still listed in each nic but no trace of the guests is present in the process list. and when i start them manually, they start with no other error than the expected RTNETLINK file exists nonfatal error. the first error i received was on the first guest to be shut down: ============================================== apollo rio # vserver prometheus stop /usr/lib/util-vserver/vserver.functions: line 804: 28440 Segmentation fault \ $_NOHUP $_VWAIT --timeout "$VSHELPER_SYNC_TIMEOUT" --terminate \ --status-fd 3 "$2" >>$_is_tmpdir/out 2>$_is_tmpdir/err 3>$_is_tmpdir/fifo internal error: 'vwait' exited with an unexpected status ''; I will try to continue but be prepared for unexpected events. * Prometheus Stopped ================================================= btw, the start and stopped messages are my additions in the pre-start/post-stop scripts. then, when i tried to start it again i got: ================================================== apollo ~ # vserver prometheus start * Prometheus starting <1>Unable to handle kernel NULL pointer dereference at virtual address 00000000 printing eip: c0136b7e *pde = 00000000 Oops: 0002 [#9] SMP Modules linked in: CPU: 1 EIP: 0060:[<c0136b7e>] Not tainted VLI EFLAGS: 00010246 (2.6.13.1-vs2.1.0-rc2) EIP is at __dealloc_vx_info+0xe/0x50 eax: 00000000 ebx: 00000d4d ecx: 00000000 edx: f5807000 esi: ffffffef edi: f5807000 ebp: eb0da000 esp: eb0dbf68 ds: 007b es: 007b ss: 0068 Process vcontext (pid: 31199, threadinfo=eb0da000 task=f5adb040) Stack: eb0da000 c0136d52 f5807000 00000d4d 00000000 fffffeff c01375b8 00000d4d c1923380 00000000 00000000 00000003 00000000 00000d4d c0136423 00000d4d 00000000 09010001 0804bcd4 bfa70704 c0102ff9 09010001 00000d4d 00000000 Call Trace: [<c0136d52>] __create_vx_info+0x92/0x1c0 [<c01375b8>] vc_ctx_create+0x98/0x100 [<c0136423>] sys_vserver+0x163/0x540 [<c0102ff9>] syscall_call+0x7/0xb Code: 4d c2 83 f8 01 89 c1 7e 9e e9 bb fd ff ff 0f bc c0 e9 9f fd ff ff 8d b4 26 00 00 00 00 83 ec 04 8b 54 24 08 8b 02 8b 4a 04 85 c0 <89> 01 74 03 89 48 04 81 4a 18 00 80 00 00 c7 42 04 00 02 20 00 /usr/lib/util-vserver/vserver.start: line 147: 31199 Segmentation fault [EMAIL PROTECTED] $_CHBIND "[EMAIL PROTECTED]" -- $_EXEC_ULIMIT "$VSERVER_DIR"/ulimits $_VCONTEXT --create "[EMAIL PROTECTED]" -- ${USE_VNAMESPACE:+$_VNAMESPACE --set -- } $_VLIMIT --dir "$VSERVER_DIR"/rlimits --missingok -- $_VSCHED --xid self "[EMAIL PROTECTED]" -- $_VUNAME --xid self --dir "$VSERVER_DIR"/uts --missingok -- "[EMAIL PROTECTED]" $_VUNAME --xid self --set -t context="$VSERVER_DIR" -- $_VATTRIBUTE --set "[EMAIL PROTECTED]" -- $_SAVE_CTXINFO "$VSERVER_DIR" $_ENV -i -- $_VCONTEXT --migrate-self --endsetup --chroot $SILENT_OPT "[EMAIL PROTECTED]" "[EMAIL PROTECTED]" -- "[EMAIL PROTECTED]" An error occured while executing the vserver startup sequence; when there are no other messages, it is very likely that the init-script (/sbin/init) failed. Common causes are: * /etc/rc.d/rc on Fedora Core 1 and RH9 fails always; the 'apt-rpm' build method knows how to deal with this, but on existing installations, appending 'true' to this file will help. Failed to start vserver 'prometheus' ==================================================== then, when i tried to shut the host down, i got this: ==================================================== kernel BUG at kernel/vserver/context.c:144! invalid operand: 0000 [#10] SMP Modules linked in: CPU: 0 EIP: 0060:[<c0136cb0>] Not tainted VLI EFLAGS: 00010246 (2.6.13.1-vs2.1.0-rc2) EIP is at free_vx_info+0x70/0x80 eax: 00000001 ebx: f4a0e938 ecx: da1e0368 edx: f58e7000 esi: f58e7000 edi: c03d89a4 ebp: da1e030c esp: eeb63da4 ds: 007b es: 007b ss: 0068 Process find (pid: 3033, threadinfo=eeb62000 task=f58fe530) Stack: c013c584 f58e7000 f4a0e938 00000020 00000004 00000000 c1907960 000000d0 fffffff4 da1e030c f4af18a4 eeb63e4c c0175181 f4af18a4 da1e030c eeb63f10 00000000 eeb63f10 eeb63e44 eeb63e4c c017557a f48d2d90 eeb63e4c eeb63f10 Call Trace: [<c013c584>] proc_virtual_lookup+0xd4/0x2a0 [<c0175181>] real_lookup+0xd1/0x100 [<c017557a>] do_lookup+0x13a/0x150 [<c0175cf7>] __link_path_walk+0x767/0xe70 [<c0146ca7>] filemap_nopage+0x207/0x3c0 [<c0176449>] link_path_walk+0x49/0xe0 [<c01767a4>] path_lookup+0x94/0x170 [<c0176a43>] __user_walk+0x33/0x60 [<c0170a5c>] vfs_lstat+0x1c/0x60 [<c01711eb>] sys_lstat64+0x1b/0x40 [<c01155e0>] do_page_fault+0x0/0x5db [<c0102ff9>] syscall_call+0x7/0xb Code: ce b0 3b c0 eb dc f6 42 18 01 74 cf 0f 0b 95 00 ce b0 3b c0 eb c5 0f 0b 93 00 ce b0 3b c0 eb b7 0f 0b 92 00 ce b0 3b c0 eb a6 90 <0f> 0b 90 00 ce b0 3b c0 eb 94 8d b6 00 00 00 00 57 56 53 83 ec * Hiding /proc entries ... apollo ~ # ============================================ and it sat forever at hiding proc entries i finally got pissed at it and logged back into it on another terminal and issued init 0 which i found out called a halt rather than a shutdown which it has done in the past... i suppose i should have done an init 6 once again.. that may have been the shut down when i initially told it to reboot. :) now iget something really odd and only when i am starting this one guest prometheus.... it shows me the startup process!:) ======================================== apollo ~ # vserver prometheus start * Prometheus starting INIT: version 2.86 booting Gentoo Linux; http://www.gentoo.org/ Copyright 1999-2005 Gentoo Foundation; Distributed under the GPLv2 * Setting hostname to prometheus ... [ ok ] * Updating environment ... [ ok ] * Cleaning /var/lock, /var/run ... [ ok ] * Cleaning /tmp directory ... [ ok ] * Setting DNS domainname to sbbsnet.net [ ok ] INIT: Entering runlevel: 3 * Starting clamd ... [ ok ] * Starting freshclam ... [ ok ] [ ok ] * Starting syslog-ng ... [ ok ] * Starting service scan ... [ ok ] * Starting spamd ... [ ok ] * Starting local ... [ ok ] INIT: no more processes left in this runlevel ============================================ at this point it does not return to the host prompt unless i press enter and when i stop it i now see the shutdown sequences but got no error. ============================================= apollo ~ # vserver prometheus stop INIT: Sending processes the TERM signal * Stopping local ... [ ok ] * Stopping spamd ... [ ok ] * Stopping service scan ... [ ok ] * Stopping services ... [ ok ] * Stopping service logging ... [ ok ] * Stopping syslog-ng ... [ ok ] * Stopping clamd ... * Failed to stop clamd [ !! ] * Stopping freshclam ... [ ok ] * Prometheus Stopped apollo ~ # ============================================== i do not see these sequences on other guests and they only became visible after this last super crash and reboot. i am hoping all these problems will go away when i set everything up fresh on the big machine... any clues what is happening? if its that kernel 'race bug' concerning smp , do you think the kernel.org people will have it fixed in a few weeks? are they even aware of it? im getting a bit apprehensive because this final machine being installed in about 2 weeks must be absolutely perfect the first time. no room for errors on that one. -- Chuck "...and the hordes of M$*ft users descended upon me in their anger, and asked 'Why do you not get the viruses or the BlueScreensOfDeath or insecure system troubles and slowness or pay through the nose for an OS as *we* do?!!', and I answered...'I use Linux'. " The Book of John, chapter 1, page 1, and end of book _______________________________________________ Vserver mailing list Vserver@list.linux-vserver.org http://list.linux-vserver.org/mailman/listinfo/vserver