Re: [zones-discuss] Zone Not Starting Properly?
I actually know that.. rpcinfo -p remote_host The script is trying to mount an nfs share nased on a configured list of remote nfs filers. In theory the client only has access to the filer determines which one by using the rpcinfo command. Derek On Tue, Dec 13, 2011 at 6:55 PM, Edward Pilatowicz edward.pilatow...@oracle.com wrote: On Tue, Dec 13, 2011 at 09:44:23AM -0600, Derek McEachern wrote: Thought I would just send an update on this. Thanks for the all the suggestions. To get around our particular issue I just added some retry logic to the /etc/init.d/ script. When it runs it if finds that the operation has failed it pauses for a second and will try again. It will try up to three times before giving up. it'd be interesting to know what particular operation is failing within the script... ed ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] Zone Not Starting Properly?
Thought I would just send an update on this. Thanks for the all the suggestions. To get around our particular issue I just added some retry logic to the /etc/init.d/ script. When it runs it if finds that the operation has failed it pauses for a second and will try again. It will try up to three times before giving up. Running more tests we were able to see that on some occasions it still fails on the first attempt but so far has always been successful on the 2nd. Derek On Thu, Dec 1, 2011 at 3:37 PM, Ian Collins i...@ianshome.com wrote: On 12/ 2/11 10:30 AM, Derek McEachern wrote: On Thu, Dec 1, 2011 at 2:48 PM, Ian Collins i...@ianshome.com mailto: i...@ianshome.com wrote: On 12/ 2/11 05:39 AM, Derek McEachern wrote: Have a peculiar problem that I haven't seen before. When starting a system that has about 35 - 40 zones on it occasionally we see that one of the zones doesn't come up properly. You can log into the zone but none of the /etc/rc3.d scripts have been run. The same zone, or a random one? What happens if you halt one or more zones before rebooting? Is there a threshold where the problem begins to occur? Random zone. We've been testing to see if there is a threshold of trying to start too many in parallel but so far we don't see anything. We saw the problem trying to start 3 zones in parallel but it was very intermittent. Like 1 out of every 4 tries at started all 40 zones we would see 1 failure. We ran some tests starting 10 zones in parallel and so far no errors. Our assumption was that if it was load related moving from 3 to 10 zones we would see problems. I have several systems that start 10 or more zones and I've never seen any problems. I agree with the comment elsewhere that you should be using SMF rather than rc scripts to start services. It is also possible to create SMF services with the appropriate dependencies to start your zones in the correct order. -- Ian. ___ zones-discuss mailing list zones-discuss@opensolaris.org
[zones-discuss] Zone Not Starting Properly?
Have a peculiar problem that I haven't seen before. When starting a system that has about 35 - 40 zones on it occasionally we see that one of the zones doesn't come up properly. You can log into the zone but none of the /etc/rc3.d scripts have been run. /var/adm/messages is completely empty and when running who -r to see the run level it doesn't report anything. # who -r run-level Dec 1 09:17 last= Anyone else seen anything similar? We are running Solaris 10 update 9. Regards, Derek ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] Zone Not Starting Properly?
System has 72GB RAM xeon cpu - 2 socket - 4 core - 16 thread zonereoot is on ufs filesystem on it's own drive, separate from OS. Derek On Thu, Dec 1, 2011 at 11:01 AM, Hung-Sheng Tsao (Lao Tsao 老曹) Ph.D. laot...@gmail.com wrote: for 30-40 zone what are the main host ram? and what kind of CPU? and how many CPU? was everything on ZFS? what are the storage/HDD for zone root? regards On 12/1/2011 11:39 AM, Derek McEachern wrote: Have a peculiar problem that I haven't seen before. When starting a system that has about 35 - 40 zones on it occasionally we see that one of the zones doesn't come up properly. You can log into the zone but none of the /etc/rc3.d scripts have been run. /var/adm/messages is completely empty and when running who -r to see the run level it doesn't report anything. # who -r run-level Dec 1 09:17 last= Anyone else seen anything similar? We are running Solaris 10 update 9. Regards, Derek ___ zones-discuss mailing listzones-disc...@opensolaris.org -- Hung-Sheng Tsao Ph D. Founder Principal HopBit GridComputing LLC cell: 9734950840http://laotsao.wordpress.com/http://laotsao.blogspot.com/ ___ zones-discuss mailing list zones-discuss@opensolaris.org ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] Zone Not Starting Properly?
Thanks Mike. The more I look at this more I think it is load related. svcs -x only shows that the LP print server is not running which I don't think has any impact on what I'm seeing. As for who not reporting what I would expect I tracked that down to someone installing the gnu tools in /usr/local/bin and then setting default path to reference those before /bin/ :-( /bin/who -r shows the zone is at run level 3. Looking at /var/svc/log/milestone-multi-user-server:default.log I can see that some of the other services have most likely not completed before it tries to run the rc scripts. It appears that the /usr filesystem hasn't yet been mounted read/write and the appstart script is logging an error that indicates rpc services are not completely running. Executing legacy init script /etc/rc3.d/S98apache. (30)Read-only file system: httpd: could not open error log file /usr/local/apache2/logs/error_log. Unable to open logs Legacy init script /etc/rc3.d/S98apache exited with return code 0. Executing legacy init script /etc/rc3.d/S99appstart. ERROR: Unable to contact any server Legacy init script /etc/rc3.d/S99appstart exited with return code 0. [ Dec 1 09:17:13 Method start exited with status 0 ] We have a process in place that only starts 3 zones at one time so we are not doing all 40 at once but it could be that with this hardware even trying 3 at a time is too much and we may need to drop to 2. Derek On Thu, Dec 1, 2011 at 12:07 PM, Mike Gerdts mike.ger...@oracle.com wrote: On Thu 01 Dec 2011 at 10:39AM, Derek McEachern wrote: Have a peculiar problem that I haven't seen before. When starting a system that has about 35 - 40 zones on it occasionally we see that one of the zones doesn't come up properly. You can log into the zone but none of the /etc/rc3.d scripts have been run. /var/adm/messages is completely empty and when running who -r to see the run level it doesn't report anything. Take a look at the output of svcs -x. Most likely you have a service that svc:/milestone/multi-user-server:default depends on (directly or indirectly) that has timed out and as such is in maintenance. Because the dependency is not satisfied, this milestone doesn't come up so the rc3 scripts are not run. My guess is the timeout is because so many zones are starting at once that the disks are being thrashed. The resulting I/O backlog slows down the startup of services, which leads to timeouts, which lead to some services failing to even try to start. A google search and a 5 second read suggests that this link may be of help to adjust the timeout of services that require a longer timeout: http://www.runningunix.com/2009/01/changing-timeouts-on-smf-services/ -- Mike Gerdts Solaris Core OS / Zones http://blogs.oracle.com/zoneszone/ ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] Zone Not Starting Properly?
Random zone. We've been testing to see if there is a threshold of trying to start too many in parallel but so far we don't see anything. We saw the problem trying to start 3 zones in parallel but it was very intermittent. Like 1 out of every 4 tries at started all 40 zones we would see 1 failure. We ran some tests starting 10 zones in parallel and so far no errors. Our assumption was that if it was load related moving from 3 to 10 zones we would see problems. Derek On Thu, Dec 1, 2011 at 2:48 PM, Ian Collins i...@ianshome.com wrote: On 12/ 2/11 05:39 AM, Derek McEachern wrote: Have a peculiar problem that I haven't seen before. When starting a system that has about 35 - 40 zones on it occasionally we see that one of the zones doesn't come up properly. You can log into the zone but none of the /etc/rc3.d scripts have been run. The same zone, or a random one? What happens if you halt one or more zones before rebooting? Is there a threshold where the problem begins to occur? -- Ian. ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] Zone Not Starting Properly?
I agree, our script could certainly be improved to add logic to check for these failures and handle them which we will probably end up doing. Derek On Thu, Dec 1, 2011 at 2:47 PM, Hung-Sheng Tsao (Lao Tsao 老曹) Ph.D. laot...@gmail.com wrote: it seems that you could 1)improve your rc script to check the other dependence for apache or 2)use SMF for apache that check other dependence my 2c On 12/1/2011 1:33 PM, Derek McEachern wrote: Thanks Mike. The more I look at this more I think it is load related. svcs -x only shows that the LP print server is not running which I don't think has any impact on what I'm seeing. As for who not reporting what I would expect I tracked that down to someone installing the gnu tools in /usr/local/bin and then setting default path to reference those before /bin/ :-( /bin/who -r shows the zone is at run level 3. Looking at /var/svc/log/milestone-multi-user-server:default.log I can see that some of the other services have most likely not completed before it tries to run the rc scripts. It appears that the /usr filesystem hasn't yet been mounted read/write and the appstart script is logging an error that indicates rpc services are not completely running. Executing legacy init script /etc/rc3.d/S98apache. (30)Read-only file system: httpd: could not open error log file /usr/local/apache2/logs/error_log. Unable to open logs Legacy init script /etc/rc3.d/S98apache exited with return code 0. Executing legacy init script /etc/rc3.d/S99appstart. ERROR: Unable to contact any server Legacy init script /etc/rc3.d/S99appstart exited with return code 0. [ Dec 1 09:17:13 Method start exited with status 0 ] We have a process in place that only starts 3 zones at one time so we are not doing all 40 at once but it could be that with this hardware even trying 3 at a time is too much and we may need to drop to 2. Derek On Thu, Dec 1, 2011 at 12:07 PM, Mike Gerdts mike.ger...@oracle.comwrote: On Thu 01 Dec 2011 at 10:39AM, Derek McEachern wrote: Have a peculiar problem that I haven't seen before. When starting a system that has about 35 - 40 zones on it occasionally we see that one of the zones doesn't come up properly. You can log into the zone but none of the /etc/rc3.d scripts have been run. /var/adm/messages is completely empty and when running who -r to see the run level it doesn't report anything. Take a look at the output of svcs -x. Most likely you have a service that svc:/milestone/multi-user-server:default depends on (directly or indirectly) that has timed out and as such is in maintenance. Because the dependency is not satisfied, this milestone doesn't come up so the rc3 scripts are not run. My guess is the timeout is because so many zones are starting at once that the disks are being thrashed. The resulting I/O backlog slows down the startup of services, which leads to timeouts, which lead to some services failing to even try to start. A google search and a 5 second read suggests that this link may be of help to adjust the timeout of services that require a longer timeout: http://www.runningunix.com/2009/01/changing-timeouts-on-smf-services/ -- Mike Gerdts Solaris Core OS / Zones http://blogs.oracle.com/zoneszone/ ___ zones-discuss mailing listzones-disc...@opensolaris.org -- Hung-Sheng Tsao Ph D. Founder Principal HopBit GridComputing LLC cell: 9734950840http://laotsao.wordpress.com/http://laotsao.blogspot.com/ ___ zones-discuss mailing list zones-discuss@opensolaris.org ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] Zone Not Starting Properly?
We haven't made the jump to zfs yet :-) We do loose some useful features but haven't spent the time to port our stuff over to use zfs. On Thu, Dec 1, 2011 at 2:47 PM, Ian Collins i...@ianshome.com wrote: On 12/ 2/11 06:07 AM, Derek McEachern wrote: System has 72GB RAM xeon cpu - 2 socket - 4 core - 16 thread zonereoot is on ufs filesystem on it's own drive, separate from OS. That (UFS) is a strange choice for a recent Solaris 10 version. You loose the useful zones/ZFS features such as cloning. -- Ian. ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] Is it possible to determine from the zone as the global zone is called
One quick method that is mentioned frequently here and one we use very successfully is to create a readonly lofs to /etc/nodename. We add the following to all our zonecfgs add fs set dir=/etc/GLOBAL set special=/etc/nodename set type=lofs add options [ro, nodevices] end so when your in a ngz you can cat /etc/GLOBAL to get the global host name. On Thu, Aug 5, 2010 at 7:00 AM, Richard L. Hamilton rlha...@smart.netwrote: Hi, i'm new here and i have a question: Is it possible to determine from the zone as the global zone is called? Is there a command in the zone like zoneadm list , which show me the name of the global-zone. I need it for a script in the zone. AFAIK, there is no standard way to do that. Some people create zones with a file containing the hostname of the global zone. Others might put that in oem-banner, or use sneep to put it in nvramrc, along with hardware serial numbers and such.http://wikis.sun.com/display/sneep/Home But none of those are a built-in solution. I like the idea of putting it in nvram better than putting it in a file, since if the zone is moved to another server, it should then show the new location without having to update a file. -- This message posted from opensolaris.org ___ zones-discuss mailing list zones-discuss@opensolaris.org ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] how dynamic is your zones network configuration?
Never. We haven't ever had the need to change the interface for a zone. On 6/4/10, Edward Pilatowicz edward.pilatow...@oracle.com wrote: hey all, i had a quick questions for all the zones users out there. after you've configured and installed a zone with ip-type=shared (the default), how often do you change the network interfaces assigned to that zone via zonecfg(1m)? frequently? infrequently? never? only when moving from testing to production? etc... thanks ed ___ zones-discuss mailing list zones-discuss@opensolaris.org -- Sent from my mobile device ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] vxfs in non-global zone
Thanks for the responses. We don't plan on running the zone root on vxfs, it will be on ufs. The VRTSvfxs package installs with parameters SUNW_PKG_ALLZONES='true' SUNW_PKG_HOLLOW='true' SUNW_PKG_THISZONE='false' so package content is not delivered to the zone just the package information so it appears to be installed. I think we are going to experiment with mounting the vxfs into the zone from the global. On Tue, Mar 16, 2010 at 10:26 AM, Henrik Johansson henr...@henkis.netwrote: I am away from home and om mobie device so i'll make it short. This work fine if you don't ever put the zoneroot on vxfs, if you do you will not be able to use all upgrade options and the Veritas supplied scripts for live upgrade only works with a vanilla install (no separate LUN for zones etc) Someone mentioned that not all packages was present in the local zone, I think that most of the VRTS packages (or to many at least) has PKG_ALLZONES set to to, so they must be installed on all zones(to have a supported system) I might be off target, it's late and I'm not supposed to do this on my vaccation;) Henrik http://sparcv9.blogspot.com ___ zones-discuss mailing list zones-discuss@opensolaris.org
[zones-discuss] vxfs in non-global zone
We have been experimenting with mounting san storage with vxfs filesystems in an ngz and there appears to be a couple of ways to accomplish this. The FAQ links to a Symantec doc ( http://sfdoccentral.symantec.com/sf/5.0MP3/solaris/html/vxvm_admin/apbs07.htm) that does it by adding a device to the zonecfg. This was problematic as our ngz doesn't didn't have the necessary Veritas packages and wasn't able to mount a vxfs filesystem. I haven't yet been able to determine if this was because of how the software was installed in the gz or if they are specifically excluded from the ngz. It could also be done with an lofs by pointing to an already mounted vxfs in the gz. Yet another Symantec document ( http://seer.entsupport.symantec.com/docs/vascont/59.html) shows the file system from the gz being directly mounted into the ngz. i.e. from the global zone boot the ngz and then mount -F vxfs /dev/vx/dsk/dg/volume /zonepath/root/mount It took me a while to get my head around this but it works. The only obvious problem I can see is that if the ngz reboots it loses its storage. There doesn't appear to be a way to automatically get it remounted. Is anyone else doing something similar? If so what has been your experience/recommendation? Derek ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] Parameters in /etc/system in the zone
Vladi, As far as I know there isn't an /etc/system file for a zone. You only have one kernel which is running in the global zone so there isn't the need for one in the ngz's. If your looking for a list of stuff to set/tune look at the resource configuration. Here is a link to a Sun doc which has details: http://docs.sun.com/app/docs/doc/817-1592?l=en Derek On Wed, Oct 21, 2009 at 10:38 AM, Yanakiev, Vladimir vladimir_yanak...@fanniemae.com wrote: Hi, All! I have a zone question - to my knowledge /etc/system in a zone has very little meaning, as these are kernel parameters. But still certain things can be set using this file. Is there a list of parameters for /etc/system, that work in a zone? I am asking in general, not for specific issue... Thanks! Vladi This e-mail and its attachments are confidential and solely for the intended addressee(s). Do not share or use them without Fannie Mae's approval. If received in error, contact the sender and delete them. ___ zones-discuss mailing list zones-discuss@opensolaris.org ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] Configure a zone through sysidcfg
Does the information in this thread help: http://www.opensolaris.org/jive/thread.jspa?threadID=108000tstart=0 On Fri, Aug 14, 2009 at 4:55 PM, v no-re...@opensolaris.org wrote: I created an exclusive IP zone. Now I want to configure it using sysidcfg and avoid the prompts at the initial login. I created the below sysidcfg file: timezone=US/Eastern system_locale=C terminal=xterms network_interface=vnic1 {dhcp protocol_ipv6=yes} root_password=abc123 security_policy=none name_service=DNS nfs4_domain=dynamic I wanted to copy this file to the zone's etc directory, but there is no such directory at this time (I already installed and booted the zone). I go to /export/zones/zone1/root but the directory is empty. There is nothing in there. There is no .../zone1/etc either. So, I created an etc directory under root directory, put my sysidcfg file, and logged into the zone. I still got the initial configuration prompts. Apparently, it didn't looked at the sysidcfg file. What I am doing wrong? Thanks... -- This message posted from opensolaris.org ___ zones-discuss mailing list zones-discuss@opensolaris.org ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] Zone in down state.
See the following threads: http://opensolaris.org/jive/thread.jspa?threadID=101438tstart=30 http://www.opensolaris.org/jive/thread.jspa?threadID=107664tstart=0 Derek On Mon, Jul 13, 2009 at 12:23 AM, Ketan no-re...@opensolaris.org wrote: One of my zone is stuck in down state, not able to boot it or halt it .. not even detach .. is there any way to recover without rebooting the whole system ( global zone ) ? -- This message posted from opensolaris.org ___ zones-discuss mailing list zones-discuss@opensolaris.org ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] Zone Stuck in down state - nfs share
You can look at the following thread where I had a similar problem with a zone stuck in a shutting down state: http://opensolaris.org/jive/thread.jspa?threadID=101438tstart=30 The other thing to look for is processing that might be accessing the ngz from the gz using fuser. You can also use pwdx /proc/* | grep slabzone1to find process. If you find any you can see what they are doing and kill them, then try shutting down the zone again. Otherwise I haven't found a way to kill the short of rebooting the box. As an aside, I was under the impression that it was not advisable to access ngz filesystems from the gz. A quick search only seems to point to the gz possibly doing something nefarious to the ngz but I can't find any technical reason it shouldn't be done. On Fri, Jul 10, 2009 at 10:58 AM, ajmai...@mchsi.com wrote: I needed to share out a non-global zones folder via nfs so I did it from the global zone like so: # share /slabzone1/zonepath/root/home Later I rebooted the zone: # zoneadm -z slabzone1 reboot The reboot command hung and the zone became stuck in the down status. I assume this is because of the nfs share, I tried unsharing it, but that didn't help: # unshare /slabzone1/zonepath/root/home nfs unshare: /slabzone1/zonepath/root/home: not shared My attempts to get the zone to transistion to installed state have all failed. I assume this is a known issue, is there anyway to recover without a reboot? Thanks, Alex ___ zones-discuss mailing list zones-discuss@opensolaris.org ___ zones-discuss mailing list zones-discuss@opensolaris.org
[zones-discuss] zonestats.pl - Proposed change to handle stuck zones.
I have been using the zonestats.pl script for a while and came across an odd issue. I have a host that has a zone stuck in the shutting_down state that I haven't been able to get clean up. When zonestats runs it sees this zone and tries to zlogin into it which has the effect of hanging up the script. I made a modification to the script to check the zone status and if it's not in a running state then skip it. In the section that of code that gathers the zones names: Current: # # Gather list of zones, their status and pool type and association. if ($DEBUG) { print /usr/sbin/zoneadm list -v\n; } open (NAMES, /usr/sbin/zoneadm list -v|); $znum=0; while (NAMES) { if (/^\s+(\S+)\s+(\S+)/) { if ($1 eq ID) { next; } $znames[$znum++]=$2; $zoneid{$2}=$1; if ($opt_N) { $zlen = length ($znames[$znum-1]); $Nmaxznamelen = $zlen $Nmaxznamelen ? $zlen : $Nmaxznamelen; } } } close NAMES; Proposed: # # Gather list of zones, their status and pool type and association. if ($DEBUG) { print /usr/sbin/zoneadm list -v\n; } open (NAMES, /usr/sbin/zoneadm list -v|); $znum=0; while (NAMES) { if (/^\s+(\S+)\s+(\S+)*\s+(\S+)*/) { if ($1 eq ID) { next; } *if ($3 ne running ) { next; }* $znames[$znum++]=$2; $zoneid{$2}=$1; if ($opt_N) { $zlen = length ($znames[$znum-1]); $Nmaxznamelen = $zlen $Nmaxznamelen ? $zlen : $Nmaxznamelen; } } } close NAMES; Regards, Derek ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] Zone Stuck in a shutting_down state in os2008.11
I guess my wording was a little confusing. What I meant by nfs mounts in the ngz from the gz is that since you can't log into the zone you need to check for hung nfs mounts from the global zone. The nfs mounts that we have had hung were from filers. Derek On Sun, Jun 21, 2009 at 3:05 PM, Craig Cory cr...@exitcertified.com wrote: For this reason and others, it is recommended to NOT mount non-global zone clients to their own global zone servers. Use lofs for these local mounts. Derek McEachern wrote: I have had the same problem with two zones and using the following two steps I was able to get one of the zones to shut down and the other one I wasn't. First, check for hung nfs mounts for the ngz from the gz. mount | grep www2. If you see any umount them from the gz. Next, check for any processes that might be accessing files in the ngz. From the gz you can do something like: # pwdx /proc/* | grep zone1 17459: /export/zone/zone1/root 17731: /export/zone/zone1/root/tmp 18022: /export/zone/zone1/root/tmp # ps -ef | egrep 18022|17731 root 18022 17731 0 13:15:33 pts/2 0:00 sleep 50 root 18064 17745 0 13:17:40 pts/1 0:00 egrep 18022|17731 root 17731 17727 0 13:14:18 pts/2 0:00 sh If you find any processes you can try and kill them. After each step try and halt the zone and see if it comes down. If neither of these work the only solution I've heard of is rebooting the host. On Sun, Jun 21, 2009 at 6:25 AM, solarg sol...@laposte.net wrote: hello all, after trying to reboot a zone, it still hang: he...@antigone:~# zoneadm -z www2 reboot;zlogin -C www2 on other termial, i try to kill the process: he...@antigone:~# ps -ef|grep www2 root 16432 1 0 Mar 18 ? 0:03 zoneadmd -z www2 root 18864 18860 0 12:18:14 pts/5 0:00 grep www2 root 18809 11676 0 12:09:53 pts/3 0:00 zoneadm -z www2 reboot he...@antigone:~# kill -9 16432 and: he...@antigone:~# zoneadm -z www2 reboot;zlogin -C www2 door_call failed: Interrupted system call zone 'www2': WARNING: zone is in state 'down', but zoneadmd does not appear to be available; restarted zoneadmd to recover. [Connected to zone 'www2' console] ~^D he...@global:~# zoneadm list -cv ID NAME STATUS PATH BRAND 73 www2 down /zones/www2ipkg shared i also have: he...@antigone:~# mdb -k ::walk zone | ::print zone_t zone_name zone_ref ... zone_name = 0xff044b049c80 www2 zone_ref = 0x2 a precedent thread said: If the refcount is greater than 0x1, it could be: 6272846 User orders zone death; NFS client thumbs nose he...@antigone:~# ps -ef|grep www2 root 19091 18860 0 13:19:32 pts/5 0:00 zoneadm -z www2 halt root 19093 1 0 13:19:32 ? 0:00 zoneadmd -z www2 root 19113 11676 0 13:24:30 pts/3 0:00 grep www2 he...@antigone:~# truss -p 19093 /4: door_return(0x, 0, 0x, 0xFE4F0E00, 1007360) (sleeping...) /3: zone_destroy(73)(sleeping...) /1: pollsys(0x08046AD0, 4, 0x, 0x) (sleeping...) /2: door_unref()(sleeping...) he...@antigone:~# truss -p 19091 door_call(6, 0x08047590)(sleeping...) Any idea? thanks for help, gerard ___ zones-discuss mailing list zones-discuss@opensolaris.org ___ zones-discuss mailing list zones-discuss@opensolaris.org -- Craig Cory Senior Instructor :: ExitCertified : Sun Certified System Administrator : Sun Certified Network Administrator : Sun Certified Security Administrator : Veritas Certified Instructor 8950 Cal Center Drive Bldg 1, Suite 110 Sacramento, California 95826 [e] craig.c...@exitcertified.com [p] 916.669.3970 [f] 916.669.3977 [w] WWW.EXITCERTIFIED.COM +-+ OTTAWA | SACRAMENTO | MONTREAL | LAS VEGAS | QUEBEC CITY | CALGARY SAN FRANCISCO | VANCOUVER | REGINA | WINNIPEG | TORONTO ___ zones-discuss mailing list zones-discuss@opensolaris.org
[zones-discuss] df -Z behaviour in global Zone
In doing some testing we came across some unexpected behaviour (at least unexpected to me) of the df -Z command when run in the global zone. If a non-global zone has an nfs fs mounted df -Z dumps all kinds of statvfs errors because as best I can tell he can't actually see the fs. It's in the gz's /etc/mntab but it can't be accessed from the gz. df -Z df: cannot statvfs /zone/zone/root/servers: Not owner This seems like a bug to me, shouldn't df ignore ngz nfs mounts? Derek ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] What is the correct approach to add Java, GlassFish to non-global zones?
It took me a while to find it but in some of my previous searches I came across something which might help: http://opensolaris.org/jive/thread.jspa?messageID=378354tstart=0 There are some links in one of the messages that talk about these packages. Derek On Thu, Jun 4, 2009 at 11:49 PM, Kevin Pan no-re...@opensolaris.org wrote: After creating a non-global zone, what is the correct approach to add Java, GlassFish, PostgreSQL, SunWebServer to the non-global zone? (use the pkg install or pkgadd command, but how? and where to find the package name?) Please provide some instructions or point to the relavant docs. Thanks!!! -- This message posted from opensolaris.org ___ zones-discuss mailing list zones-discuss@opensolaris.org ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] df -Z behaviour in global Zone
It was actually another problem we were trying to solve and happened to notice the df behaviour while debugging. We had a script in the gz that wanted to mount the nfs share. It did a simple check to see if it was already mounted by parsing mount output which showed the ngz mount. The script in the gz tried to access the mount that was in the ngz which faild, at which point the script decided to umount the share since it couldn't get to it. The umount was successful, which is expected, but the poor people in the ngz just saw there mount disappear. Woops. df doesn't really ignore nfs mounts since it reports the data correctly if you don't use the -Z option and if you run it in the ngz. The -Z option works correctly for other fs types in the ngz just not nfs. I would have expected df -Z to just not report on nfs fs's in the ngs instead of throwing the cryptic Not Owner error. On Fri, Jun 5, 2009 at 4:52 PM, Peter Tribble peter.trib...@gmail.comwrote: That's why df does ignore nfs mounts by default and you have the -Z option to force it to go look. I've not yet come across a case where df -Z has told me anything useful - what problem are you trying to solve by running df -Z? -- -Peter Tribble http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/ ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] Zone Stuck in a shutting_down state
Steve, Thanks for this information. I ran through the commands and this is what I see. ::kmem_cache ! grep rnode a6438008 rnode_cache 0200 00 65670506 a643c008 rnode4_cache 0200 00 9680 When I run the following: a6438008::walk kmem | ::print rnode_t r_vnode | ::wnode2path I can see two listed fs's /zone/zonetest-new/root/opt/xxx//logs.tar /zone/zonetest-new/root/var/xxx/ But when I run the ::fsinfo command there are no nfs mounted filesystems in the zonetest-new fs. Both of the fs's listed were nfs mounted when the zone was up and running. It really looks like someone was doing something with the logs.tar file at the time the zone was coming down which probably started all my problems. I really appreciate all this info. Thanks, Derek On Thu, May 7, 2009 at 12:34 AM, Steve Lawrence stephen.lawre...@sun.comwrote: Related comments from bug below (X'ed out some paths): The zone in question clearly has too many references 030004a09680::print zone_t zone_ref zone_ref = 0t11 Ten too many, to be precise. So what's holding onto the zone? Well the rnode cache has 5 entries ::kmem_cache ! grep rnode 030003a1e988 rnode_cache 00 640 572988 030003a20988 rnode4_cache 00 9840 030003a1e988::walk kmem | ::print rnode_t r_vnode | ::vnode2path /opt/zones/z1/root/ /opt/zones/z1/root/ /opt/zones/z1/root/ /opt/zones/z1/root/ /opt/zones/z1/root/ even though no nfs filesystems are mounted ::fsinfo VFSP FS MOUNT 0187f420 ufs / 0187f508 devfs /devices 03315780 ctfs/system/contract 033156c0 proc/proc 03315600 mntfs /etc/mnttab 03315480 tmpfs /etc/svc/volatile 033153c0 objfs /system/object 0300039987c0 namefs /etc/svc/volatile/repository_door 0300039984c0 fd /dev/fd 030003a99e00 ufs /var 030003998400 tmpfs /tmp 030003a99680 tmpfs /var/run 030003a98f00 namefs /var/run/name_service_door 030003a98b40 namefs /var/run/sysevent_channels/syseventd_channel... 030003a989c0 namefs /etc/sysevent/sysevent_door 030003a98780 namefs /etc/sysevent/devfsadm_event_channel/1 030003a98540 namefs /dev/.zone_reg_door 030003a983c0 namefs /dev/.devfsadm_synch_door 030003a99380 namefs /etc/sysevent/piclevent_door 0300044b1d80 namefs /var/run/picld_door 030003a99200 ufs /opt 0300044b0700 namefs /var/run/zones/z1.zoneadmd_door And as apparent from the path, all of those rnodes refer to zone z1 through their mntinfo structure 030003a1e988::walk kmem | ::print rnode_t r_vnode-v_vfsp-vfs_data | ::print mntinfo_t mi_zone | ::zone ADDR ID NAME PATH 030004a09680 1 z1 /opt/zones/z1/root/ 030004a09680 1 z1 /opt/zones/z1/root/ 030004a09680 1 z1 /opt/zones/z1/root/ 030004a09680 1 z1 /opt/zones/z1/root/ 030004a09680 1 z1 /opt/zones/z1/root/ So if each of those rnodes has two holds on the zone, then that accounts for all of the extra holds exactly. ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] Zone Stuck in a shutting_down state
The zone is in a shutting_down state. The mdb command for this zone returns 0x1a, greater then 1. zone_name = 0xfe86c83d61c0 zonetest-new zone_ref = 0x1a This is new to me, what is the refcount counting? What should this value be for the zone to shutdown? There is a zoneadmd processes running for the zone. Trussing it and issuing the halt command I can see it look up the zone, get some attributes and then send it the shutdown which is where it hangs. 5364: psargs: zoneadmd -z zonetest-new 5364/2: door_return(0xFE6CD870, 4096, 0x, 0) (sleeping...) 5364/4: door_return(0x, 0, 0x, 0) (sleeping...) 5364/3: door_unref()(sleeping...) 5364/1: pollsys(0x08046C50, 4, 0x, 0x) (sleeping...) 5364/2: 32.4128 door_return(0xFE6CD870, 4096, 0x, 0)= 0 5364/2: 32.4642 door_ucred(0x08077150) = 0 5364/2: 32.4769 zone_lookup(zonetest-new) = 64 5364/2: 32.4770 zone_getattr(64, ZONE_ATTR_STATUS, 0xFE6CD85C, 4) = 4 5364/2: 32.4843 zone_lookup(zonetest-new) = 64 5364/2: zone_shutdown(64) (sleeping...) I already tried killing the zoneadmd process and issuing the halt and all it does is start back up the zoneadmd process and hang. I can't force a crashdump on the system since I can't take the box down. Bug 6272846 makes reference to nfs version 3, (which is the version we are using), and the client apparently leaking rnodes. Is there any way to verify this other then a forced crashdump? I might take a live core of the system and open a case to see if that yields anything. Derek On Wed, May 6, 2009 at 4:08 PM, Steve Lawrence stephen.lawre...@sun.comwrote: zsched is always unkillable. It will only exit when instructed to by zoneadmd. Is the remaining zone shutting down, or down? (zoneadm list -v). What is the ref_count on the zone? # mdb -k ::walk zone | ::print zone_t zone_name zone_ref If the refcount is greater than 0x1, it could be: 6272846 User orders zone death; NFS client thumbs nose No workaround for this one. A crashdump would help investigate a zone_ref greater than 1. Is there a zoneadmd process for the given zone? # pgrep -lf zoneadmd If so, please provide truss -p pid of this process. You may also attempt killing this zoneadmd process (which lives in the global zone), and then re-attempting zoneadm -z zonename halt. Thanks, -Steve L. ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] Zone Stuck in a shutting_down state
I don't believe that I can see the comments since they are not public. Is that something you can pass along? On Wed, May 6, 2009 at 5:27 PM, Steve Lawrence stephen.lawre...@sun.comwrote: I already tried killing the zoneadmd process and issuing the halt and all it does is start back up the zoneadmd process and hang.* I can't force a crashdump on the system since I can't take the box down. Bug 6272846 makes reference to nfs version 3, (which is the version we are using), and the client apparently leaking rnodes. Is there any way to verify this other then a forced crashdump? I might take a live core of the system and open a case to see if that yields anything. The zone_ref 1 means that something in the kernel is holding the zone. You should be able to use mdb -k on the live system, and issue dcmds similar to the comments of 6272846. No need to force a crashdump or take a live crashdump. -Steve L. Derek On Wed, May 6, 2009 at 4:08 PM, Steve Lawrence [1]stephen.lawre...@sun.com wrote: zsched is always unkillable. *It will only exit when instructed to by zoneadmd. Is the remaining zone shutting down, or down? *(zoneadm list -v). What is the ref_count on the zone? # mdb -k ::walk zone | ::print zone_t zone_name zone_ref If the refcount is greater than 0x1, it could be: * * * *6272846 User orders zone death; NFS client thumbs nose No workaround for this one. *A crashdump would help investigate a zone_ref greater than 1. Is there a zoneadmd process for the given zone? # pgrep -lf zoneadmd If so, please provide *truss -p pid of this process. *You may also attempt killing this zoneadmd process (which lives in the global zone), and then re-attempting zoneadm -z zonename halt. Thanks, -Steve L. References Visible links 1. mailto:stephen.lawre...@sun.com ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] Zone Stuck in a shutting_down state
Just follow up on the progress and resolution to my stuck zones problem. I had two zones stuck in the shutting_down state. Based on initial feedback I looked for nfs mounts in /etc/mnttab in the gz that were mounted in the ngz. There were a couple and umount'ed them. Then I was able to find two processes that indicated they were accessing the ngz filesystem, zsched and svc.configd. I tried trussing svc.configd but was unable to due to unanticipated system error. I killed svc.configd, ran the zoneadm halt and the zone successfully shut down. On the second zone that's stuck I umount'ed the nfs file systems and checked for processes accessing the ngz filesystem and the only one reported is zsched. Trying to halt the zone doesn't do anything and from the looks of it zsched appears to be unkillable. It looks like this zone is here to stay until I can reboot the box. Derek On Tue, Apr 28, 2009 at 10:09 PM, Derek McEachern derekmceach...@gmail.comwrote: There were a bunch of nfs mounts listed in the /etc/mntab of the global zone. I was able to umount them but zone is still hung up. I tried killing the zoneadmd process and ran zoneadm halt again and it started the zoneadmd back up but it didn't do anything. Thanks to everyone for their suggestions, looks like I'm going to have to wait until I can take the box down for a reboot. Regards, Derek On Tue, Apr 28, 2009 at 9:02 PM, Alexander J. Maidak ajmai...@mchsi.comwrote: If its hung nfs mount you should be able to see it still mounted in the /etc/mntab file in the global zone: grep nfs /etc/mntab. It will be mounted under the zonepath. You should then be able to do a umount -f /path-to-nfsmnt from the global zone and if you're really lucky the zone will finish shutting down. -Alex On Tue, 2009-04-28 at 16:19 -0500, Derek McEachern wrote: It's possible that it could be nfs mount related since the zone did have nfs mounted fs's but they should have been umounted prior to shutting down the zone. In any event I can no longer get into the zone to checkusing zlogin and zlogin -C. I tried Bryan's suggestion on looking for processes that might have open filehandles to files under the zone's filesystem tree but I don't see that there are any. On Tue, Apr 28, 2009 at 3:40 PM, Bryan Allen b...@mirrorshades.net wrote: +-- | On 2009-04-28 15:37:22, Derek McEachern wrote: | | We were trying to bring down a zone on a S10 U4 system and it ended up stuck | in the shutting_down state. | | ID NAME STATUS PATH BRANDIP | 74 zonetest-new shutting_down /zone/zonetest-new native | shared | | | The only process I see running is the zoneadmd process | | dlet15:/home/derekm/ ps -efZ | grep zonetest-new | globalroot 12680 1 0 Apr 24 ? 0:02 zoneadmd -z | zonetest-new Do any processes (notably shells in the global zones) have an open filehandle somewhere under the zone's filesystem tree? This can (at least on Sol10) cause zones to not shut down, since it can't close the FH (I assume, anyway). -- bda cyberpunk is dead. long live cyberpunk. ___ zones-discuss mailing list zones-discuss@opensolaris.org ___ zones-discuss mailing list zones-discuss@opensolaris.org ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] Zone Stuck in a shutting_down state
It's possible that it could be nfs mount related since the zone did have nfs mounted fs's but they should have been umounted prior to shutting down the zone. In any event I can no longer get into the zone to checkusing zlogin and zlogin -C. I tried Bryan's suggestion on looking for processes that might have open filehandles to files under the zone's filesystem tree but I don't see that there are any. On Tue, Apr 28, 2009 at 3:40 PM, Bryan Allen b...@mirrorshades.net wrote: +-- | On 2009-04-28 15:37:22, Derek McEachern wrote: | | We were trying to bring down a zone on a S10 U4 system and it ended up stuck | in the shutting_down state. | | ID NAME STATUS PATH BRANDIP | 74 zonetest-new shutting_down /zone/zonetest-new native | shared | | | The only process I see running is the zoneadmd process | | dlet15:/home/derekm/ ps -efZ | grep zonetest-new | globalroot 12680 1 0 Apr 24 ? 0:02 zoneadmd -z | zonetest-new Do any processes (notably shells in the global zones) have an open filehandle somewhere under the zone's filesystem tree? This can (at least on Sol10) cause zones to not shut down, since it can't close the FH (I assume, anyway). -- bda cyberpunk is dead. long live cyberpunk. ___ zones-discuss mailing list zones-discuss@opensolaris.org ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] Zone Stuck in a shutting_down state
There were a bunch of nfs mounts listed in the /etc/mntab of the global zone. I was able to umount them but zone is still hung up. I tried killing the zoneadmd process and ran zoneadm halt again and it started the zoneadmd back up but it didn't do anything. Thanks to everyone for their suggestions, looks like I'm going to have to wait until I can take the box down for a reboot. Regards, Derek On Tue, Apr 28, 2009 at 9:02 PM, Alexander J. Maidak ajmai...@mchsi.comwrote: If its hung nfs mount you should be able to see it still mounted in the /etc/mntab file in the global zone: grep nfs /etc/mntab. It will be mounted under the zonepath. You should then be able to do a umount -f /path-to-nfsmnt from the global zone and if you're really lucky the zone will finish shutting down. -Alex On Tue, 2009-04-28 at 16:19 -0500, Derek McEachern wrote: It's possible that it could be nfs mount related since the zone did have nfs mounted fs's but they should have been umounted prior to shutting down the zone. In any event I can no longer get into the zone to checkusing zlogin and zlogin -C. I tried Bryan's suggestion on looking for processes that might have open filehandles to files under the zone's filesystem tree but I don't see that there are any. On Tue, Apr 28, 2009 at 3:40 PM, Bryan Allen b...@mirrorshades.net wrote: +-- | On 2009-04-28 15:37:22, Derek McEachern wrote: | | We were trying to bring down a zone on a S10 U4 system and it ended up stuck | in the shutting_down state. | | ID NAME STATUS PATH BRANDIP | 74 zonetest-new shutting_down /zone/zonetest-new native | shared | | | The only process I see running is the zoneadmd process | | dlet15:/home/derekm/ ps -efZ | grep zonetest-new | globalroot 12680 1 0 Apr 24 ? 0:02 zoneadmd -z | zonetest-new Do any processes (notably shells in the global zones) have an open filehandle somewhere under the zone's filesystem tree? This can (at least on Sol10) cause zones to not shut down, since it can't close the FH (I assume, anyway). -- bda cyberpunk is dead. long live cyberpunk. ___ zones-discuss mailing list zones-discuss@opensolaris.org ___ zones-discuss mailing list zones-discuss@opensolaris.org ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] zonestat.pl without Resource Pools
Jeff, Sorry this has taken so long to get to but yes, if I enable the pools and pools/dynamic services it runs as expected. Has any work started on a 'real' zonestat yet? On Tue, Feb 17, 2009 at 9:44 PM, Jeff Victor jeff.j.vic...@gmail.comwrote: On Tue, Feb 17, 2009 at 4:09 PM, Derek McEachern derekmceach...@gmail.com wrote: We are in the process of deploying applications into zones and I've been looking at how to monitor what each zone is up to regarding resource usage. I downloaded the zonestat.pl script to play around with and out of the box it didn't actually give me any zone specific information. After poking around the code it turns out it won't break out any zone level details unless resource pooling is enabled. We are deploying our zones without resource restrictions. This is a known problem with v1.3. I am working on v1.3.1 which will fix that problem. As a temporary workaround: does it work correctly if you enable pools and don't configure any? GZ# svcadm enable pools GZ# svcadm enable pools/dynamic I hacked the script to get around this problem for now but is this a feature we can get added to the baseline? Jeff, how are changes handled to this script since you appear to the owner? To make a contribution to the OpenSolaris community, first you would register as a contributor. The other option is to request a specific change in behavior, and I will try to get to it promptly. However, please understand (as the project web pages state) that this is a prototype to help us learn what a 'real' zonestat should do. The 'real' zonestat would be written in C or D for improved functionality and considerably better performance. This Perl script consumes a great deal of CPU cycles. --JeffV ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] zonepath on NFS
As far as I could tell nfs is not supported. I believe it will not allow the zone path to be on a fs type of procfs, mntfs, autofs, nfs, or cachefs. On Thu, Feb 19, 2009 at 9:59 AM, Brian Kolaci brian.kol...@sun.com wrote: Hi, I wanted to check the availability of putting the zonepath on NFS. Is this now supported? Are there issues with Live Upgrade? Any constraints or gotchas? Thanks, Brian ___ zones-discuss mailing list zones-discuss@opensolaris.org ___ zones-discuss mailing list zones-discuss@opensolaris.org