Do you have the panic message or crash dump?

-Steve L.


On Wed, Dec 23, 2009 at 09:26:17AM -0500, Glenn Brunette wrote:
>
> Frank,
>
> Just verified that something is still wrong in b129, but the problem is
> _not_ with a vanilla configuration.  This time around boot/halt #102,
> the system apparently shutdown/panic'ed?  I was running it overnight
> and came in to a system that had been rebooted.  I did not see any
> problem in the audit log nor in /var/adm/messages.  Any pointers?
>
> I am running an Immutable Service Container configuration, based upon
> the installation steps at:
>
> http://kenai.com/projects/isc/pages/OpenSolaris
>
> Specifically:
>
> pfexec pkg install SUNWmercurial
> hg clone https://kenai.com/hg/isc~source  isc
> pfexec isc/bin/iscadm.ksh -N 0
> pfexec bootadm update-archive
> pfexec shutdown -g 0 -i 0 -y
> [after reboot]
> zlogin -C isc1
> [wait for zone isc1 to fully complete boot process]
>
> then run the script that I provided that stops and starts the zone.
>
> Apparently, there must be something wrong with the interaction of
> components.  In this configuration, we have things like resource
> controls, auditing, IP Filter/IP NAT, and zones all enabled.
>
> Would it be possible for you to try the steps above on a fresh
> install of 2009.06 or later (b129 is where I am right now).  Also,
> if you have other debugging methods, please let me know.
>
> I am going to kick this off again to see if I can catch any
> error messages.
>
> g
>
>
> On 12/16/09 3:49 AM, Frank Batschulat (Home) wrote:
>> Glenn, I've not been able to reproduce this on onnv build 126 (it's running 
>> for a day now)
>>
>> if that script would reproduce 6894901 straight away it should be doing so
>> on 126 as well (similar to what you've seen in 127)
>>
>> this pose the question if there are either some other details in your
>> environment that I don't have or if that script really reliably reproduces 
>> 6894901
>>
>> cheers
>> frankB
>>
>> On Tue, 15 Dec 2009 15:23:06 +0100, Frank Batschulat 
>> (Home)<frank.batschu...@sun.com>  wrote:
>>
>>> Glenn, I've been running this test case now for nearly a day on build 129, 
>>> could'nt
>>> reproduce at all. good chance this being indeed fixed by 6894901 in build 
>>> 128.
>>>
>>> I'll also try to reproduce this now on buil 126.
>>>
>>> cheers
>>> frankB
>>>
>>> On Fri, 11 Dec 2009 21:48:52 +0100, Glenn Brunette<glenn.brune...@sun.com>  
>>> wrote:
>>>>
>>>> As part of some Immutable Service Container[1] demonstration that I am
>>>> creating for an event in January.  I have the need to start/stop a zone
>>>> quite a few times (as part of a Self-Cleansing[2] demo).  During the
>>>> course of my testing, I have been able to repeatedly get zoneadm to
>>>> hang.
>>>>
>>>> Since I am working with a highly customized configuration, I started
>>>> over with a default zone on OpenSolaris (b127) and was able to repeat
>>>> this issue.  To reproduce this problem use the following script after
>>>> creating a zone usual the normal/default steps:
>>>>
>>>> isc...@osol-isc:~$ while : ; do
>>>>   >  echo "`date`: ZONE BOOT"
>>>>   >  pfexec zoneadm -z test boot
>>>>   >  sleep 30
>>>>   >  pfexec zoneamd -z test halt
>>>>   >  echo "`date`: ZONE HALT"
>>>>   >  sleep 10
>>>>   >  done
>>>>
>>>> This script works just fine for a while, but eventually zoneadm hangs
>>>> (was at pass #90 in my last test).  When this happens, zoneadm is shown
>>>> to be consuming quite a bit of CPU:
>>>>
>>>>      PID USERNAME  SIZE   RSS STATE  PRI NICE      TIME  CPU PROCESS/NLWP
>>>>
>>>>    16598 root       11M 3140K run      1    0   0:54:49  74% zoneadm/1
>>>>
>>>>
>>>> A stack trace of zoneadm shows:
>>>>
>>>> isc...@osol-isc:~$ pfexec pstack `pgrep zoneadm`
>>>> 16082:     zoneadmd -z test
>>>> -----------------  lwp# 1  --------------------------------
>>>> -----------------  lwp# 2  --------------------------------
>>>>    feef41c6 door     (0, 0, 0, 0, 0, 8)
>>>>    feed99f7 door_unref_func (3ed2, fef81000, fe33efe8, feeee39e) + 67
>>>>    feeee3f3 _thrp_setup (fe5b0a00) + 9b
>>>>    feeee680 _lwp_start (fe5b0a00, 0, 0, 0, 0, 0)
>>>> -----------------  lwp# 3  --------------------------------
>>>>    feef420f __door_return () + 2f
>>>> -----------------  lwp# 4  --------------------------------
>>>>    feef420f door     (0, 0, 0, fe140e00, f5f00, a)
>>>>    feed9f57 door_create_func (0, fef81000, fe140fe8, feeee39e) + 2f
>>>>    feeee3f3 _thrp_setup (fe5b1a00) + 9b
>>>>    feeee680 _lwp_start (fe5b1a00, 0, 0, 0, 0, 0)
>>>> 16598:     zoneadm -z test boot
>>>>    feef3fc8 door     (6, 80476d0, 0, 0, 0, 3)
>>>>    feede653 door_call (6, 80476d0, 400, fe3d43f7) + 7b
>>>>    fe3d44f0 zonecfg_call_zoneadmd (8047e33, 8047730, 8078448, 1) + 124
>>>>    0805792d boot_func (0, 8047d74, 100, 805ff0b) + 1cd
>>>>    08060125 main     (4, 8047d64, 8047d78, 805570f) + 2b9
>>>>    0805576d _start   (4, 8047e28, 8047e30, 8047e33, 8047e38, 0) + 7d
>>>>
>>>>
>>>> A stack trace of zoneadmd shows:
>>>>
>>>> isc...@osol-isc:~$ pfexec pstack `pgrep zoneadmd`
>>>> 16082:     zoneadmd -z test
>>>> -----------------  lwp# 1  --------------------------------
>>>> -----------------  lwp# 2  --------------------------------
>>>>    feef41c6 door     (0, 0, 0, 0, 0, 8)
>>>>    feed99f7 door_unref_func (3ed2, fef81000, fe33efe8, feeee39e) + 67
>>>>    feeee3f3 _thrp_setup (fe5b0a00) + 9b
>>>>    feeee680 _lwp_start (fe5b0a00, 0, 0, 0, 0, 0)
>>>> -----------------  lwp# 3  --------------------------------
>>>>    feef4147 __door_ucred (80a37c8, fef81000, fe23e838, feed9cfe) + 27
>>>>    feed9d0d door_ucred (fe23f870, 1000, 0, 0) + 32
>>>>    08058a88 server   (0, fe23f8f0, 510, 0, 0, 8058a04) + 84
>>>>    feef4240 __door_return () + 60
>>>> -----------------  lwp# 4  --------------------------------
>>>>    feef420f door     (0, 0, 0, fe140e00, f5f00, a)
>>>>    feed9f57 door_create_func (0, fef81000, fe140fe8, feeee39e) + 2f
>>>>    feeee3f3 _thrp_setup (fe5b1a00) + 9b
>>>>    feeee680 _lwp_start (fe5b1a00, 0, 0, 0, 0, 0)
>>>>
>>>>
>>>> A truss of zoneadm (-f -vall -wall -tall) shows this looping:
>>>>
>>>> 16598:  door_call(6, 0x080476D0)                        = 0
>>>> 16598:          data_ptr=8047730 data_size=0
>>>> 16598:          desc_ptr=0x0 desc_num=0
>>>> 16598:          rbuf=0x807F2D8 rsize=4096
>>>> 16598:  close(6)                                        = 0
>>>> 16598:  mkdir("/var/run/zones", 0700)                   Err#17 EEXIST
>>>> 16598:  chmod("/var/run/zones", 0700)                   = 0
>>>> 16598:  open("/var/run/zones/test.zoneadm.lock", O_RDWR|O_CREAT, 0600) = 6
>>>> 16598:  fcntl(6, F_SETLKW, 0x08046DC0)                  = 0
>>>> 16598:          typ=F_WRLCK  whence=SEEK_SET start=0     len=0
>>>> sys=4277003009 pid=6
>>>> 16598:  open("/var/run/zones/test.zoneadmd_door", O_RDONLY) = 7
>>>> 16598:  door_info(7, 0x08047230)                        = 0
>>>> 16598:          target=16082 proc=0x8058A04 data=0x0
>>>> 16598:          attributes=DOOR_UNREF|DOOR_REFUSE_DESC|DOOR_NO_CANCEL
>>>> 16598:          uniquifier=26426
>>>> 16598:  close(7)                                        = 0
>>>> 16598:  close(6)                                        = 0
>>>> 16598:  open("/var/run/zones/test.zoneadmd_door", O_RDONLY) = 6
>>>> 16082/3:        door_return(0x00000000, 0, 0x00000000, 0xFE23FE00,
>>>> 1007360) = 0
>>>> 16082/3:        door_ucred(0x080A37C8)                          = 0
>>>> 16082/3:                euid=0 egid=0
>>>> 16082/3:                ruid=0 rgid=0
>>>> 16082/3:                pid=16598 zoneid=0
>>>> 16082/3:                E: all
>>>> 16082/3:                I: basic
>>>> 16082/3:                P: all
>>>> 16082/3:                L: all
>>>>
>>>>
>>>> PID 16598 is zoneadm and PID 16082 is zoneadmd.
>>>>
>>>>
>>>> Is this a known issue?  Are there any other things that I can do to
>>>> help debug this situation?  Once things get into this state, I have
>>>> only been able to recover by rebooting the zone.
>>>>
>>>>
>>>>
>>>> Please advise.
>>>>
>>>> g
>>>>
>>>>
>>>> [1] http://kenai.com/projects/isc/pages/OpenSolaris
>>>> [2]
>>>> http://kenai.com/attachments/wiki_images/isc/isc-autonomic-cleansing-time-v1.3.png
>>>> _______________________________________________
>>>> zones-discuss mailing list
>>>> zones-discuss@opensolaris.org
>>>>
>>>
>>>
>>>
>>
>>
>>
> _______________________________________________
> zones-discuss mailing list
> zones-discuss@opensolaris.org
_______________________________________________
zones-discuss mailing list
zones-discuss@opensolaris.org

Reply via email to