Re: [zones-discuss] Webrev for CR 6782448
On Wed, 23 Dec 2009 01:34:59 +0100, Jordan Vaughan wrote: >>> http://cr.opensolaris.org/~flippedb/onnv-zone2 [...] > zone_lookup_nwif() needs the three loop checks. > > I regenerated the webrev. You'll notice that the assertion was replaced > by a check that returns Z_INSUFFICIENT_SPEC. Hey Jordan, thanks for the exhaustive reply. understood. I was ignoring the fact that without these checks the xml parsing loop would generate false alarm for such conditions: net: address: 10.5.234.15/24 physical: bge0 defrouter not specified zonecfg:mojo> select net address=10.5.234.15/24 select net: No such resource with that id lgtm! cheers frankB ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] Zones on shared storage - a warning
On Tue, Dec 22, 2009 at 8:02 PM, Mike Gerdts wrote: > I've been playing around with zones on NFS a bit and have run into > what looks to be a pretty bad snag - ZFS keeps seeing read and/or > checksum errors. This exists with S10u8 and OpenSolaris dev build > snv_129. This is likely a blocker for anything thinking of > implementing parts of Ed's Zones on Shared Storage: > > http://hub.opensolaris.org/bin/view/Community+Group+zones/zoss > > The OpenSolaris example appears below. The order of events is: > > 1) Create a file on NFS, turn it into a zpool > 2) Configure a zone with the pool as zonepath > 3) Install the zone, verify that the pool is healthy > 4) Boot the zone, observe that the pool is sick [snip] An off list conversation and a bit of digging into other tests I have done shows that this is likely limited to NFSv3. I cannot say that this problem has been seen with NFSv4. -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zones-discuss mailing list zones-discuss@opensolaris.org
[zones-discuss] Zones on shared storage - a warning
I've been playing around with zones on NFS a bit and have run into what looks to be a pretty bad snag - ZFS keeps seeing read and/or checksum errors. This exists with S10u8 and OpenSolaris dev build snv_129. This is likely a blocker for anything thinking of implementing parts of Ed's Zones on Shared Storage: http://hub.opensolaris.org/bin/view/Community+Group+zones/zoss The OpenSolaris example appears below. The order of events is: 1) Create a file on NFS, turn it into a zpool 2) Configure a zone with the pool as zonepath 3) Install the zone, verify that the pool is healthy 4) Boot the zone, observe that the pool is sick r...@soltrain19# mount filer:/path /mnt r...@soltrain19# cd /mnt r...@soltrain19# mkdir osolzone r...@soltrain19# mkfile -n 8g root r...@soltrain19# zpool create -m /zones/osol osol /mnt/osolzone/root r...@soltrain19# zonecfg -z osol osol: No such zone configured Use 'create' to begin configuring a new zone. zonecfg:osol> create zonecfg:osol> info zonename: osol zonepath: brand: ipkg autoboot: false bootargs: pool: limitpriv: scheduling-class: ip-type: shared hostid: zonecfg:osol> set zonepath=/zones/osol zonecfg:osol> set autoboot=false zonecfg:osol> verify zonecfg:osol> commit zonecfg:osol> exit r...@soltrain19# chmod 700 /zones/osol r...@soltrain19# zoneadm -z osol install Publisher: Using opensolaris.org (http://pkg.opensolaris.org/dev/ http://pkg-na-2.opensolaris.org/dev/). Publisher: Using contrib (http://pkg.opensolaris.org/contrib/). Image: Preparing at /zones/osol/root. Cache: Using /var/pkg/download. Sanity Check: Looking for 'entire' incorporation. Installing: Core System (output follows) DOWNLOAD PKGS FILESXFER (MB) Completed46/46 12334/1233493.1/93.1 PHASEACTIONS Install Phase18277/18277 No updates necessary for this image. Installing: Additional Packages (output follows) DOWNLOAD PKGS FILESXFER (MB) Completed36/36 3339/333921.3/21.3 PHASEACTIONS Install Phase 4466/4466 Note: Man pages can be obtained by installing SUNWman Postinstall: Copying SMF seed repository ... done. Postinstall: Applying workarounds. Done: Installation completed in 2139.186 seconds. Next Steps: Boot the zone, then log into the zone console (zlogin -C) to complete the configuration process. 6.3 Boot the OpenSolaris zone r...@soltrain19# zpool status osol pool: osol state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM osol ONLINE 0 0 0 /mnt/osolzone/root ONLINE 0 0 0 errors: No known data errors r...@soltrain19# zoneadm -z osol boot r...@soltrain19# zpool status osol pool: osol state: DEGRADED status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: none requested config: NAME STATE READ WRITE CKSUM osol DEGRADED 0 0 0 /mnt/osolzone/root DEGRADED 0 0 117 too many errors errors: No known data errors r...@soltrain19# zlogin osol uptime 5:31pm up 1 min(s), 0 users, load average: 0.69, 0.38, 0.52 -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] Webrev for CR 6782448
Hi Frank, Thanks for reviewing my fix. I'll respond to your questions below. On 12/22/09 05:55 AM, Frank Batschulat (Home) wrote: On Sat, 19 Dec 2009 04:28:52 +0100, Jordan Vaughan wrote: I expanded my webrev to include my fix for 6910339 zonecfg coredumps with badly formed 'select net defrouter' I need someone to review my changes. The webrev is still accessible via http://cr.opensolaris.org/~flippedb/onnv-zone2 Hey Jordan looks good to me modulo this in zonecfg_lookup_nwif() size_t addrspec;/* nonzero if tabptr has IP addr */ size_t physspec;/* nonzero if tabptr has interface */ +size_t defrouterspec; /* nonzero if tabptr has def. router */ if (tabptr == NULL) return (Z_INVAL); + * zone_nwif_address, zone_nwif_physical, and zone_nwif_defrouter are + * arrays, so no NULL checks are necessary. */ addrspec = strlen(tabptr->zone_nwif_address); physspec = strlen(tabptr->zone_nwif_physical); -assert(addrspec > 0 || physspec > 0); +defrouterspec = strlen(tabptr->zone_nwif_defrouter); +assert(addrspec != 0 || physspec != 0 || defrouterspec != 0); so we do consider any of them being 0 a fault given the assert(), fine, but yet we do check for this again inside the loop: +if (physspec != 0 && (fetchprop(cur, DTD_ATTR_PHYSICAL, +physical, sizeof (physical)) != Z_OK || +strcmp(tabptr->zone_nwif_physical, physical) != 0)) +continue; +if (addrspec != 0 && (fetchprop(cur, DTD_ATTR_ADDRESS, address, +sizeof (address)) != Z_OK || +!zonecfg_same_net_address(tabptr->zone_nwif_address, +address))) +continue; +if (defrouterspec != 0 && (fetchprop(cur, DTD_ATTR_DEFROUTER, +address, sizeof (address)) != Z_OK || +!zonecfg_same_net_address(tabptr->zone_nwif_defrouter, +address))) +continue; a good argument could probably be made to turn this assert into a real check and return Z_INVAL for any of those 3 being 0 and get rid of the checks inside the xml parsing loop ? The assertion doesn't fail if any of the three variables is zero; it fails if all of them are zero. However, your suggestion that we transform the assertion into a real check that returns Z_INVAL or Z_INSUFFICIENT_SPEC is good. I was able to easily produce a core dump on my system even without my fix: ---8<--- root arrakis [16:12:49]# zonecfg -z mojo zonecfg:mojo> select net address="" Assertion failed: addrspec > 0 || physspec > 0, file ../common/libzonecfg.c, line 2170 zsh: IOT instruction (core dumped) cz mojo ---8<--- I verified that changing the assertion into a real check that returns Z_INSUFFICIENT_SPEC eliminates the problem: ---8<--- root tcm3000-01 [16:13:03 1]# cz mojo zonecfg:mojo> select net address="" select net: Insufficient specification ---8<--- However, the three checks in the loop (physspec != 0, etc.) are necessary even after converting the assertion into a non-asserting test. Suppose that a zone were to have the following net configuration: ---8<--- zonecfg:mojo> info net net: address: 10.5.234.15/24 physical: bge0 defrouter not specified ---8<--- If I were to eliminate the three checks in the loop, then if I were to issue a "select net address=10.5.234.15/24", then zonecfg(1M) would claim that the zone doesn't have a network resource with an address of 10.5.234.15/24! This follows from the way the three if statements would work without the three aforementioned checks: physspec would be zero (because the query doesn't specify a physical interface) but the network resource's physical property would be nonempty, which would make the strcmp(3C) invocation in the first if statement return a nonzero value and cause the function to skip the network resource that it would have otherwise selected! Here is some output from zonecfg(1M) while it's using a libzonecfg that lacks the three loop checks: ---8<--- root tcm3000-01 [16:25:12 1]# cz mojo zonecfg:mojo> info zonename: mojo zonepath: /export/mojo brand: solaris10 autoboot: true bootargs: pool: limitpriv: scheduling-class: ip-type: shared hostid: net: address: 10.5.234.15/24 physical: bge0 defrouter not specified zonecfg:mojo> select net address=10.5.234.15/24 select net: No such resource with that id zonecfg:mojo> ---8<--- zone_lookup_nwif() needs the three loop checks. I regenerated the webrev. You'll notice that the assertion was replaced by a check that returns Z_INSUFFICIENT_SPEC. Thanks again for reviewing my fix, Jordan cheers frankB ___ zones-discuss mailing list zones-discuss@opensolaris.org
[zones-discuss] Any way to limit I/O?
Is there any way to limit the amount of I/O that a zone can do? I'm thinking particularly of disk IOPS, but a general way of limiting I/O would be fine too. Thanks Andrew. -- This message posted from opensolaris.org ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] Webrev for CR 6909222
Hi Frank, Thanks for reviewing my fix. Native-branded zones will disappear when Solaris Express dies (which should happen in a few builds); therefore, it isn't worthwhile to fix this problem for native-branded zones. No special script code is needed for ipkg-branded zones because IPS package variants will handle the problem. Yes, I added the same code to s10_boot.ksh in case administrators inadvertently resurrect create_ramdisk. However, your comment raises an issue: My fix won't prevent the mkisofs(8) error message in all cases. If a zone administrator somehow reinstalls create_ramdisk (say, through an update) without rebooting the zone, then if an administrator in the global zone updates boot archives via bootadm(1M), then the global zone administrator will see the mkisofs(8) error. This isn't a problem because the error is harmless and the aforementioned scenario will rarely occur. Thanks again for the review, Jordan On 12/22/09 07:06 AM, Frank Batschulat (Home) wrote: On Tue, 22 Dec 2009 00:46:00 +0100, Jordan Vaughan wrote: I need someone to review my fix for 6909222 reboot of system upgraded from 128 to build 129 generated error from an s10 zone due to boot-archive My webrev is accessible via http://cr.opensolaris.org/~flippedb/onnv-s10c Jordan, looks good to me. what about /usr/lib/brand/ipkg/p2v and perhaps /usr/lib/brand/ipkg/pkgcreatezone for the ipkg brand ? and usr/src/lib/brand/native/zone/p2v.ksh and usr/src/lib/brand/native/zone/image_install.ksh for the native brand ? I'd assume that in the future running an s10u9 update for an s10u8 branded zone, could that potentially put back the ' /boot/solaris/bin/create_ramdisk' script but that'd be taken care of by the s10_boot.ksh then. cheers frankB ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] zoneadm hangs after repeated boot/halt use
Frank, I am back from vacation and will be doing some additional testing. I have upgraded to b129 to see if the problem persists. I have first created a basic (generic) zone to see how it behaves. If ok, I will apply the Immutable Service Container construction kit to see if there is any change. The ISC Toolkit enables things like resource controls, auditing, etc. which may influence the results I suppose - which is why I am starting with a vanilla system. Keep you posted. Thanks for your hard work looking into this! g On 12/16/09 3:49 AM, Frank Batschulat (Home) wrote: Glenn, I've not been able to reproduce this on onnv build 126 (it's running for a day now) if that script would reproduce 6894901 straight away it should be doing so on 126 as well (similar to what you've seen in 127) this pose the question if there are either some other details in your environment that I don't have or if that script really reliably reproduces 6894901 cheers frankB On Tue, 15 Dec 2009 15:23:06 +0100, Frank Batschulat (Home) wrote: Glenn, I've been running this test case now for nearly a day on build 129, could'nt reproduce at all. good chance this being indeed fixed by 6894901 in build 128. I'll also try to reproduce this now on buil 126. cheers frankB On Fri, 11 Dec 2009 21:48:52 +0100, Glenn Brunette wrote: As part of some Immutable Service Container[1] demonstration that I am creating for an event in January. I have the need to start/stop a zone quite a few times (as part of a Self-Cleansing[2] demo). During the course of my testing, I have been able to repeatedly get zoneadm to hang. Since I am working with a highly customized configuration, I started over with a default zone on OpenSolaris (b127) and was able to repeat this issue. To reproduce this problem use the following script after creating a zone usual the normal/default steps: isc...@osol-isc:~$ while : ; do > echo "`date`: ZONE BOOT" > pfexec zoneadm -z test boot > sleep 30 > pfexec zoneamd -z test halt > echo "`date`: ZONE HALT" > sleep 10 > done This script works just fine for a while, but eventually zoneadm hangs (was at pass #90 in my last test). When this happens, zoneadm is shown to be consuming quite a bit of CPU: PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP 16598 root 11M 3140K run 10 0:54:49 74% zoneadm/1 A stack trace of zoneadm shows: isc...@osol-isc:~$ pfexec pstack `pgrep zoneadm` 16082: zoneadmd -z test - lwp# 1 - lwp# 2 feef41c6 door (0, 0, 0, 0, 0, 8) feed99f7 door_unref_func (3ed2, fef81000, fe33efe8, f39e) + 67 f3f3 _thrp_setup (fe5b0a00) + 9b f680 _lwp_start (fe5b0a00, 0, 0, 0, 0, 0) - lwp# 3 feef420f __door_return () + 2f - lwp# 4 feef420f door (0, 0, 0, fe140e00, f5f00, a) feed9f57 door_create_func (0, fef81000, fe140fe8, f39e) + 2f f3f3 _thrp_setup (fe5b1a00) + 9b f680 _lwp_start (fe5b1a00, 0, 0, 0, 0, 0) 16598: zoneadm -z test boot feef3fc8 door (6, 80476d0, 0, 0, 0, 3) feede653 door_call (6, 80476d0, 400, fe3d43f7) + 7b fe3d44f0 zonecfg_call_zoneadmd (8047e33, 8047730, 8078448, 1) + 124 0805792d boot_func (0, 8047d74, 100, 805ff0b) + 1cd 08060125 main (4, 8047d64, 8047d78, 805570f) + 2b9 0805576d _start (4, 8047e28, 8047e30, 8047e33, 8047e38, 0) + 7d A stack trace of zoneadmd shows: isc...@osol-isc:~$ pfexec pstack `pgrep zoneadmd` 16082: zoneadmd -z test - lwp# 1 - lwp# 2 feef41c6 door (0, 0, 0, 0, 0, 8) feed99f7 door_unref_func (3ed2, fef81000, fe33efe8, f39e) + 67 f3f3 _thrp_setup (fe5b0a00) + 9b f680 _lwp_start (fe5b0a00, 0, 0, 0, 0, 0) - lwp# 3 feef4147 __door_ucred (80a37c8, fef81000, fe23e838, feed9cfe) + 27 feed9d0d door_ucred (fe23f870, 1000, 0, 0) + 32 08058a88 server (0, fe23f8f0, 510, 0, 0, 8058a04) + 84 feef4240 __door_return () + 60 - lwp# 4 feef420f door (0, 0, 0, fe140e00, f5f00, a) feed9f57 door_create_func (0, fef81000, fe140fe8, f39e) + 2f f3f3 _thrp_setup (fe5b1a00) + 9b f680 _lwp_start (fe5b1a00, 0, 0, 0, 0, 0) A truss of zoneadm (-f -vall -wall -tall) shows this looping: 16598: door_call(6, 0x080476D0)= 0 16598: data_ptr=8047730 data_size=0 16598: desc_ptr=0x0 desc_num=0 16598: rbuf=0x807F2D8 rsize=4096 16598: close(6)= 0 16598: mkdir("/var/run/zones", 0700) Err#17 EEXIST 16598: chmod("/var/run/zones", 0700) = 0
Re: [zones-discuss] Application leaking on local zone
I reproduced the problem on global zone. Thanks!! Jeff Victor wrote: It would be useful to know if the memory leak is in locked memory or not. What isthe output of the following command, in both cases (app in GZ, app in a zone): GZ# pmap -x --JeffV On Thu, Dec 17, 2009 at 5:09 AM, AdinaKalin wrote: Hello, I'm struggling with the following problem and I have no idea how to solve it. I'm testing an application which is running fine on a global zone,but memory leaking when installed on a local zone. The local zone has its whole root and a very simple, basic configuration: bash-3.00# zonecfg -z mdmMDMzone zonecfg:mdmMDMzone> info zonename: mdmMDMzone zonepath: /mdmMDMzone brand: native autoboot: true bootargs: pool: limitpriv: default,dtrace_proc,dtrace_user,proc_priocntl,proc_lock_memory scheduling-class: FSS ip-type: shared net: address: 192.168.109.14 physical: e1000g0 defrouter not specified One of the application processes, when started on global zone, has an rss of about 5 GB ( prstat -s rss ) and it keeps this size to the end of the test. If I stop the application on global zone and I start it on local zone, the same process starts with the normal size ( 5gb on prstat -s rss ) but is growing during the test ( I saw it 25GB on a server with 32 gb RAM ) until is failing. I don't understand why is this behavior and if the application has a memory leak, why I don't see it on the global zone. Any help is more than welcome!!! <>___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] Webrev for CR 6909222
On Tue, 22 Dec 2009 00:46:00 +0100, Jordan Vaughan wrote: > I need someone to review my fix for > > 6909222 reboot of system upgraded from 128 to build 129 generated error > from an s10 zone due to boot-archive > > My webrev is accessible via > > http://cr.opensolaris.org/~flippedb/onnv-s10c Jordan, looks good to me. what about /usr/lib/brand/ipkg/p2v and perhaps /usr/lib/brand/ipkg/pkgcreatezone for the ipkg brand ? and usr/src/lib/brand/native/zone/p2v.ksh and usr/src/lib/brand/native/zone/image_install.ksh for the native brand ? I'd assume that in the future running an s10u9 update for an s10u8 branded zone, could that potentially put back the ' /boot/solaris/bin/create_ramdisk' script but that'd be taken care of by the s10_boot.ksh then. cheers frankB ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] Webrev for CR 6782448
On Tue, 22 Dec 2009 14:55:34 +0100, Frank Batschulat (Home) wrote: > a good argument could probably be made to turn this assert into a real > check and return Z_INVAL for any of those 3 being 0 and get rid of > the checks inside the xml parsing loop ? probably rather Z_INSUFFICIENT_SPEC then Z_INVAL though. ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] Webrev for CR 6782448
On Sat, 19 Dec 2009 04:28:52 +0100, Jordan Vaughan wrote: > I expanded my webrev to include my fix for > > 6910339 zonecfg coredumps with badly formed 'select net defrouter' > > I need someone to review my changes. The webrev is still accessible via > > http://cr.opensolaris.org/~flippedb/onnv-zone2 Hey Jordan looks good to me modulo this in zonecfg_lookup_nwif() size_t addrspec;/* nonzero if tabptr has IP addr */ size_t physspec;/* nonzero if tabptr has interface */ +size_t defrouterspec; /* nonzero if tabptr has def. router */ if (tabptr == NULL) return (Z_INVAL); + * zone_nwif_address, zone_nwif_physical, and zone_nwif_defrouter are + * arrays, so no NULL checks are necessary. */ addrspec = strlen(tabptr->zone_nwif_address); physspec = strlen(tabptr->zone_nwif_physical); -assert(addrspec > 0 || physspec > 0); +defrouterspec = strlen(tabptr->zone_nwif_defrouter); +assert(addrspec != 0 || physspec != 0 || defrouterspec != 0); so we do consider any of them being 0 a fault given the assert(), fine, but yet we do check for this again inside the loop: +if (physspec != 0 && (fetchprop(cur, DTD_ATTR_PHYSICAL, +physical, sizeof (physical)) != Z_OK || +strcmp(tabptr->zone_nwif_physical, physical) != 0)) +continue; +if (addrspec != 0 && (fetchprop(cur, DTD_ATTR_ADDRESS, address, +sizeof (address)) != Z_OK || +!zonecfg_same_net_address(tabptr->zone_nwif_address, +address))) +continue; +if (defrouterspec != 0 && (fetchprop(cur, DTD_ATTR_DEFROUTER, +address, sizeof (address)) != Z_OK || +!zonecfg_same_net_address(tabptr->zone_nwif_defrouter, +address))) +continue; a good argument could probably be made to turn this assert into a real check and return Z_INVAL for any of those 3 being 0 and get rid of the checks inside the xml parsing loop ? cheers frankB ___ zones-discuss mailing list zones-discuss@opensolaris.org