Re: [zones-discuss] Webrev for CR 6782448

2009-12-22 Thread Frank Batschulat (Home)
On Wed, 23 Dec 2009 01:34:59 +0100, Jordan Vaughan  
wrote:

>>> http://cr.opensolaris.org/~flippedb/onnv-zone2
[...]
> zone_lookup_nwif() needs the three loop checks.
>
> I regenerated the webrev.  You'll notice that the assertion was replaced
> by a check that returns Z_INSUFFICIENT_SPEC.

Hey Jordan, thanks for the exhaustive reply. understood. I was ignoring
the fact that without these checks the xml parsing loop would generate 
false alarm for such conditions:

net:
address: 10.5.234.15/24
physical: bge0
defrouter not specified
zonecfg:mojo> select net address=10.5.234.15/24
select net: No such resource with that id

lgtm!

cheers
frankB

___
zones-discuss mailing list
zones-discuss@opensolaris.org


Re: [zones-discuss] Zones on shared storage - a warning

2009-12-22 Thread Mike Gerdts
On Tue, Dec 22, 2009 at 8:02 PM, Mike Gerdts  wrote:
> I've been playing around with zones on NFS a bit and have run into
> what looks to be a pretty bad snag - ZFS keeps seeing read and/or
> checksum errors.  This exists with S10u8 and OpenSolaris dev build
> snv_129.  This is likely a blocker for anything thinking of
> implementing parts of Ed's Zones on Shared Storage:
>
> http://hub.opensolaris.org/bin/view/Community+Group+zones/zoss
>
> The OpenSolaris example appears below.  The order of events is:
>
> 1) Create a file on NFS, turn it into a zpool
> 2) Configure a zone with the pool as zonepath
> 3) Install the zone, verify that the pool is healthy
> 4) Boot the zone, observe that the pool is sick
[snip]

An off list conversation and a bit of digging into other tests I have
done shows that this is likely limited to NFSv3.  I cannot say that
this problem has been seen with NFSv4.

-- 
Mike Gerdts
http://mgerdts.blogspot.com/
___
zones-discuss mailing list
zones-discuss@opensolaris.org


[zones-discuss] Zones on shared storage - a warning

2009-12-22 Thread Mike Gerdts
I've been playing around with zones on NFS a bit and have run into
what looks to be a pretty bad snag - ZFS keeps seeing read and/or
checksum errors.  This exists with S10u8 and OpenSolaris dev build
snv_129.  This is likely a blocker for anything thinking of
implementing parts of Ed's Zones on Shared Storage:

http://hub.opensolaris.org/bin/view/Community+Group+zones/zoss

The OpenSolaris example appears below.  The order of events is:

1) Create a file on NFS, turn it into a zpool
2) Configure a zone with the pool as zonepath
3) Install the zone, verify that the pool is healthy
4) Boot the zone, observe that the pool is sick

r...@soltrain19# mount filer:/path /mnt
r...@soltrain19# cd /mnt
r...@soltrain19# mkdir osolzone
r...@soltrain19# mkfile -n 8g root
r...@soltrain19# zpool create -m /zones/osol osol /mnt/osolzone/root
r...@soltrain19# zonecfg -z osol
osol: No such zone configured
Use 'create' to begin configuring a new zone.
zonecfg:osol> create
zonecfg:osol> info
zonename: osol
zonepath:
brand: ipkg
autoboot: false
bootargs:
pool:
limitpriv:
scheduling-class:
ip-type: shared
hostid:
zonecfg:osol> set zonepath=/zones/osol
zonecfg:osol> set autoboot=false
zonecfg:osol> verify
zonecfg:osol> commit
zonecfg:osol> exit

r...@soltrain19# chmod 700 /zones/osol

r...@soltrain19# zoneadm -z osol install
   Publisher: Using opensolaris.org (http://pkg.opensolaris.org/dev/
http://pkg-na-2.opensolaris.org/dev/).
   Publisher: Using contrib (http://pkg.opensolaris.org/contrib/).
   Image: Preparing at /zones/osol/root.
   Cache: Using /var/pkg/download.
Sanity Check: Looking for 'entire' incorporation.
  Installing: Core System (output follows)
DOWNLOAD  PKGS   FILESXFER (MB)
Completed46/46 12334/1233493.1/93.1

PHASEACTIONS
Install Phase18277/18277
No updates necessary for this image.
  Installing: Additional Packages (output follows)
DOWNLOAD  PKGS   FILESXFER (MB)
Completed36/36   3339/333921.3/21.3

PHASEACTIONS
Install Phase  4466/4466

Note: Man pages can be obtained by installing SUNWman
 Postinstall: Copying SMF seed repository ... done.
 Postinstall: Applying workarounds.
Done: Installation completed in 2139.186 seconds.

  Next Steps: Boot the zone, then log into the zone console (zlogin -C)
  to complete the configuration process.
6.3 Boot the OpenSolaris zone
r...@soltrain19# zpool status osol
  pool: osol
 state: ONLINE
 scrub: none requested
config:

NAME  STATE READ WRITE CKSUM
osol  ONLINE   0 0 0
  /mnt/osolzone/root  ONLINE   0 0 0

errors: No known data errors

r...@soltrain19# zoneadm -z osol boot

r...@soltrain19# zpool status osol
  pool: osol
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: none requested
config:

NAME  STATE READ WRITE CKSUM
osol  DEGRADED 0 0 0
  /mnt/osolzone/root  DEGRADED 0 0   117  too many errors

errors: No known data errors

r...@soltrain19# zlogin osol uptime
  5:31pm  up 1 min(s),  0 users,  load average: 0.69, 0.38, 0.52


-- 
Mike Gerdts
http://mgerdts.blogspot.com/
___
zones-discuss mailing list
zones-discuss@opensolaris.org


Re: [zones-discuss] Webrev for CR 6782448

2009-12-22 Thread Jordan Vaughan

Hi Frank,

Thanks for reviewing my fix.  I'll respond to your questions below.

On 12/22/09 05:55 AM, Frank Batschulat (Home) wrote:

On Sat, 19 Dec 2009 04:28:52 +0100, Jordan Vaughan  
wrote:


I expanded my webrev to include my fix for

6910339 zonecfg coredumps with badly formed 'select net defrouter'

I need someone to review my changes.  The webrev is still accessible via

http://cr.opensolaris.org/~flippedb/onnv-zone2


Hey Jordan looks good to me modulo this in zonecfg_lookup_nwif()

 size_t addrspec;/* nonzero if tabptr has IP addr */
 size_t physspec;/* nonzero if tabptr has interface */
+size_t defrouterspec;   /* nonzero if tabptr has def. router */
 
 if (tabptr == NULL)

 return (Z_INVAL);
 
+ * zone_nwif_address, zone_nwif_physical, and zone_nwif_defrouter are

+ * arrays, so no NULL checks are necessary.
  */
 addrspec = strlen(tabptr->zone_nwif_address);
 physspec = strlen(tabptr->zone_nwif_physical);
-assert(addrspec > 0 || physspec > 0);
+defrouterspec = strlen(tabptr->zone_nwif_defrouter);
+assert(addrspec != 0 || physspec != 0 || defrouterspec != 0);
 


so we do consider any of them being 0 a fault given the assert(), fine, but yet
we do check for this again inside the loop:

+if (physspec != 0 && (fetchprop(cur, DTD_ATTR_PHYSICAL,
+physical, sizeof (physical)) != Z_OK ||
+strcmp(tabptr->zone_nwif_physical, physical) != 0))
+continue;
+if (addrspec != 0 && (fetchprop(cur, DTD_ATTR_ADDRESS, address,
+sizeof (address)) != Z_OK ||
+!zonecfg_same_net_address(tabptr->zone_nwif_address,
+address)))
+continue;
+if (defrouterspec != 0 && (fetchprop(cur, DTD_ATTR_DEFROUTER,
+address, sizeof (address)) != Z_OK ||
+!zonecfg_same_net_address(tabptr->zone_nwif_defrouter,
+address)))
+continue;

a good argument could probably be made to turn this assert into a real
check and return Z_INVAL for any of those 3 being 0 and get rid of
the checks inside the xml parsing loop ?


The assertion doesn't fail if any of the three variables is zero; it 
fails if all of them are zero.  However, your suggestion that we 
transform the assertion into a real check that returns Z_INVAL or 
Z_INSUFFICIENT_SPEC is good.  I was able to easily produce a core dump 
on my system even without my fix:


---8<---
root arrakis [16:12:49]# zonecfg -z mojo
zonecfg:mojo> select net address=""
Assertion failed: addrspec > 0 || physspec > 0, file 
../common/libzonecfg.c, line 2170

zsh: IOT instruction (core dumped)  cz mojo
---8<---

I verified that changing the assertion into a real check that returns 
Z_INSUFFICIENT_SPEC eliminates the problem:


---8<---
root tcm3000-01 [16:13:03 1]# cz mojo
zonecfg:mojo> select net address=""
select net: Insufficient specification
---8<---

However, the three checks in the loop (physspec != 0, etc.) are 
necessary even after converting the assertion into a non-asserting test. 
 Suppose that a zone were to have the following net configuration:


---8<---
zonecfg:mojo> info net
net:
address: 10.5.234.15/24
physical: bge0
defrouter not specified
---8<---

If I were to eliminate the three checks in the loop, then if I were to 
issue a "select net address=10.5.234.15/24", then zonecfg(1M) would 
claim that the zone doesn't have a network resource with an address of 
10.5.234.15/24!  This follows from the way the three if statements would 
work without the three aforementioned checks: physspec would be zero 
(because the query doesn't specify a physical interface) but the network 
resource's physical property would be nonempty, which would make the 
strcmp(3C) invocation in the first if statement return a nonzero value 
and cause the function to skip the network resource that it would have 
otherwise selected!


Here is some output from zonecfg(1M) while it's using a libzonecfg that 
lacks the three loop checks:


---8<---
root tcm3000-01 [16:25:12 1]# cz mojo
zonecfg:mojo> info
zonename: mojo
zonepath: /export/mojo
brand: solaris10
autoboot: true
bootargs:
pool:
limitpriv:
scheduling-class:
ip-type: shared
hostid:
net:
address: 10.5.234.15/24
physical: bge0
defrouter not specified
zonecfg:mojo> select net address=10.5.234.15/24
select net: No such resource with that id
zonecfg:mojo>
---8<---

zone_lookup_nwif() needs the three loop checks.

I regenerated the webrev.  You'll notice that the assertion was replaced 
by a check that returns Z_INSUFFICIENT_SPEC.


Thanks again for reviewing my fix,
Jordan



cheers
frankB



___
zones-discuss mailing list
zones-discuss@opensolaris.org


[zones-discuss] Any way to limit I/O?

2009-12-22 Thread andrew
Is there any way to limit the amount of I/O that a zone can do? I'm thinking 
particularly of disk IOPS, but a general way of limiting I/O would be fine too.

Thanks

Andrew.
-- 
This message posted from opensolaris.org
___
zones-discuss mailing list
zones-discuss@opensolaris.org


Re: [zones-discuss] Webrev for CR 6909222

2009-12-22 Thread Jordan Vaughan

Hi Frank,

Thanks for reviewing my fix.  Native-branded zones will disappear when 
Solaris Express dies (which should happen in a few builds); therefore, 
it isn't worthwhile to fix this problem for native-branded zones.  No 
special script code is needed for ipkg-branded zones because IPS package 
variants will handle the problem.


Yes, I added the same code to s10_boot.ksh in case administrators 
inadvertently resurrect create_ramdisk.  However, your comment raises an 
issue: My fix won't prevent the mkisofs(8) error message in all cases. 
If a zone administrator somehow reinstalls create_ramdisk (say, through 
an update) without rebooting the zone, then if an administrator in the 
global zone updates boot archives via bootadm(1M), then the global zone 
administrator will see the mkisofs(8) error.  This isn't a problem 
because the error is harmless and the aforementioned scenario will 
rarely occur.


Thanks again for the review,
Jordan


On 12/22/09 07:06 AM, Frank Batschulat (Home) wrote:

On Tue, 22 Dec 2009 00:46:00 +0100, Jordan Vaughan  
wrote:


I need someone to review my fix for

6909222 reboot of system upgraded from 128 to build 129 generated error
from an s10 zone due to boot-archive

My webrev is accessible via

http://cr.opensolaris.org/~flippedb/onnv-s10c


Jordan, looks good to me.

what about /usr/lib/brand/ipkg/p2v 
and perhaps /usr/lib/brand/ipkg/pkgcreatezone for the ipkg brand ?


and usr/src/lib/brand/native/zone/p2v.ksh 
and usr/src/lib/brand/native/zone/image_install.ksh for the native brand ?


I'd assume that in the future running an s10u9 update for an s10u8 branded
zone, could that potentially put back the ' /boot/solaris/bin/create_ramdisk' script 
but that'd be taken care of by the s10_boot.ksh then.


cheers
frankB




___
zones-discuss mailing list
zones-discuss@opensolaris.org


Re: [zones-discuss] zoneadm hangs after repeated boot/halt use

2009-12-22 Thread Glenn Brunette


Frank,

I am back from vacation and will be doing some additional testing.  I
have upgraded to b129 to see if the problem persists.  I have first
created a basic (generic) zone to see how it behaves.  If ok, I will
apply the Immutable Service Container construction kit to see if there
is any change.  The ISC Toolkit enables things like resource controls,
auditing, etc. which may influence the results I suppose - which is
why I am starting with a vanilla system.  Keep you posted.  Thanks for
your hard work looking into this!

g


On 12/16/09 3:49 AM, Frank Batschulat (Home) wrote:

Glenn, I've not been able to reproduce this on onnv build 126 (it's running for 
a day now)

if that script would reproduce 6894901 straight away it should be doing so
on 126 as well (similar to what you've seen in 127)

this pose the question if there are either some other details in your
environment that I don't have or if that script really reliably reproduces 
6894901

cheers
frankB

On Tue, 15 Dec 2009 15:23:06 +0100, Frank Batschulat 
(Home)  wrote:


Glenn, I've been running this test case now for nearly a day on build 129, 
could'nt
reproduce at all. good chance this being indeed fixed by 6894901 in build 128.

I'll also try to reproduce this now on buil 126.

cheers
frankB

On Fri, 11 Dec 2009 21:48:52 +0100, Glenn Brunette  
wrote:


As part of some Immutable Service Container[1] demonstration that I am
creating for an event in January.  I have the need to start/stop a zone
quite a few times (as part of a Self-Cleansing[2] demo).  During the
course of my testing, I have been able to repeatedly get zoneadm to
hang.

Since I am working with a highly customized configuration, I started
over with a default zone on OpenSolaris (b127) and was able to repeat
this issue.  To reproduce this problem use the following script after
creating a zone usual the normal/default steps:

isc...@osol-isc:~$ while : ; do
  >  echo "`date`: ZONE BOOT"
  >  pfexec zoneadm -z test boot
  >  sleep 30
  >  pfexec zoneamd -z test halt
  >  echo "`date`: ZONE HALT"
  >  sleep 10
  >  done

This script works just fine for a while, but eventually zoneadm hangs
(was at pass #90 in my last test).  When this happens, zoneadm is shown
to be consuming quite a bit of CPU:

 PID USERNAME  SIZE   RSS STATE  PRI NICE  TIME  CPU PROCESS/NLWP

   16598 root   11M 3140K run  10   0:54:49  74% zoneadm/1


A stack trace of zoneadm shows:

isc...@osol-isc:~$ pfexec pstack `pgrep zoneadm`
16082:  zoneadmd -z test
-  lwp# 1  
-  lwp# 2  
   feef41c6 door (0, 0, 0, 0, 0, 8)
   feed99f7 door_unref_func (3ed2, fef81000, fe33efe8, f39e) + 67
   f3f3 _thrp_setup (fe5b0a00) + 9b
   f680 _lwp_start (fe5b0a00, 0, 0, 0, 0, 0)
-  lwp# 3  
   feef420f __door_return () + 2f
-  lwp# 4  
   feef420f door (0, 0, 0, fe140e00, f5f00, a)
   feed9f57 door_create_func (0, fef81000, fe140fe8, f39e) + 2f
   f3f3 _thrp_setup (fe5b1a00) + 9b
   f680 _lwp_start (fe5b1a00, 0, 0, 0, 0, 0)
16598:  zoneadm -z test boot
   feef3fc8 door (6, 80476d0, 0, 0, 0, 3)
   feede653 door_call (6, 80476d0, 400, fe3d43f7) + 7b
   fe3d44f0 zonecfg_call_zoneadmd (8047e33, 8047730, 8078448, 1) + 124
   0805792d boot_func (0, 8047d74, 100, 805ff0b) + 1cd
   08060125 main (4, 8047d64, 8047d78, 805570f) + 2b9
   0805576d _start   (4, 8047e28, 8047e30, 8047e33, 8047e38, 0) + 7d


A stack trace of zoneadmd shows:

isc...@osol-isc:~$ pfexec pstack `pgrep zoneadmd`
16082:  zoneadmd -z test
-  lwp# 1  
-  lwp# 2  
   feef41c6 door (0, 0, 0, 0, 0, 8)
   feed99f7 door_unref_func (3ed2, fef81000, fe33efe8, f39e) + 67
   f3f3 _thrp_setup (fe5b0a00) + 9b
   f680 _lwp_start (fe5b0a00, 0, 0, 0, 0, 0)
-  lwp# 3  
   feef4147 __door_ucred (80a37c8, fef81000, fe23e838, feed9cfe) + 27
   feed9d0d door_ucred (fe23f870, 1000, 0, 0) + 32
   08058a88 server   (0, fe23f8f0, 510, 0, 0, 8058a04) + 84
   feef4240 __door_return () + 60
-  lwp# 4  
   feef420f door (0, 0, 0, fe140e00, f5f00, a)
   feed9f57 door_create_func (0, fef81000, fe140fe8, f39e) + 2f
   f3f3 _thrp_setup (fe5b1a00) + 9b
   f680 _lwp_start (fe5b1a00, 0, 0, 0, 0, 0)


A truss of zoneadm (-f -vall -wall -tall) shows this looping:

16598:  door_call(6, 0x080476D0)= 0
16598:  data_ptr=8047730 data_size=0
16598:  desc_ptr=0x0 desc_num=0
16598:  rbuf=0x807F2D8 rsize=4096
16598:  close(6)= 0
16598:  mkdir("/var/run/zones", 0700)   Err#17 EEXIST
16598:  chmod("/var/run/zones", 0700)   = 0

Re: [zones-discuss] Application leaking on local zone

2009-12-22 Thread AdinaKalin

I reproduced the problem on global zone. Thanks!!

Jeff Victor wrote:

It would be useful to know if the memory leak is in locked memory or
not. What isthe output of the following command, in both cases (app in
GZ, app in a zone):

GZ# pmap -x 

--JeffV

On Thu, Dec 17, 2009 at 5:09 AM, AdinaKalin
 wrote:
  

Hello,

I'm struggling with the following problem and I have no idea how to
solve it.
I'm testing an application which is running fine on a global zone,but
memory leaking when installed on a local zone.

The local zone has its whole root and a very simple, basic configuration:
bash-3.00# zonecfg -z mdmMDMzone
zonecfg:mdmMDMzone> info
zonename: mdmMDMzone
zonepath: /mdmMDMzone
brand: native
autoboot: true
bootargs:
pool:
limitpriv: default,dtrace_proc,dtrace_user,proc_priocntl,proc_lock_memory
scheduling-class: FSS
ip-type: shared
net:
address: 192.168.109.14
physical: e1000g0
defrouter not specified

One of the application processes, when started on global zone, has an
rss of about 5 GB ( prstat -s rss ) and it keeps this size to the end of
the test. If I stop the application on global zone and I start it on
local zone, the same process starts with the normal size ( 5gb on prstat
-s rss ) but is growing  during the test ( I saw it 25GB on a server
with 32 gb RAM ) until is failing. I don't understand why is this
behavior and if the application has a memory leak, why I don't see it on
the
global zone.

Any help is more than welcome!!!



<>___
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] Webrev for CR 6909222

2009-12-22 Thread Frank Batschulat (Home)
On Tue, 22 Dec 2009 00:46:00 +0100, Jordan Vaughan  
wrote:

> I need someone to review my fix for
>
> 6909222 reboot of system upgraded from 128 to build 129 generated error
> from an s10 zone due to boot-archive
>
> My webrev is accessible via
>
> http://cr.opensolaris.org/~flippedb/onnv-s10c

Jordan, looks good to me.

what about /usr/lib/brand/ipkg/p2v 
and perhaps /usr/lib/brand/ipkg/pkgcreatezone for the ipkg brand ?

and usr/src/lib/brand/native/zone/p2v.ksh 
and usr/src/lib/brand/native/zone/image_install.ksh for the native brand ?

I'd assume that in the future running an s10u9 update for an s10u8 branded
zone, could that potentially put back the ' /boot/solaris/bin/create_ramdisk' 
script 
but that'd be taken care of by the s10_boot.ksh then.

cheers
frankB


___
zones-discuss mailing list
zones-discuss@opensolaris.org


Re: [zones-discuss] Webrev for CR 6782448

2009-12-22 Thread Frank Batschulat (Home)
On Tue, 22 Dec 2009 14:55:34 +0100, Frank Batschulat (Home) 
 wrote:

> a good argument could probably be made to turn this assert into a real
> check and return Z_INVAL for any of those 3 being 0 and get rid of
> the checks inside the xml parsing loop ?

probably rather Z_INSUFFICIENT_SPEC then Z_INVAL though.
___
zones-discuss mailing list
zones-discuss@opensolaris.org


Re: [zones-discuss] Webrev for CR 6782448

2009-12-22 Thread Frank Batschulat (Home)
On Sat, 19 Dec 2009 04:28:52 +0100, Jordan Vaughan  
wrote:

> I expanded my webrev to include my fix for
>
> 6910339 zonecfg coredumps with badly formed 'select net defrouter'
>
> I need someone to review my changes.  The webrev is still accessible via
>
> http://cr.opensolaris.org/~flippedb/onnv-zone2

Hey Jordan looks good to me modulo this in zonecfg_lookup_nwif()

 size_t addrspec;/* nonzero if tabptr has IP addr */
 size_t physspec;/* nonzero if tabptr has interface */
+size_t defrouterspec;   /* nonzero if tabptr has def. router */
 
 if (tabptr == NULL)
 return (Z_INVAL);
 
+ * zone_nwif_address, zone_nwif_physical, and zone_nwif_defrouter are
+ * arrays, so no NULL checks are necessary.
  */
 addrspec = strlen(tabptr->zone_nwif_address);
 physspec = strlen(tabptr->zone_nwif_physical);
-assert(addrspec > 0 || physspec > 0);
+defrouterspec = strlen(tabptr->zone_nwif_defrouter);
+assert(addrspec != 0 || physspec != 0 || defrouterspec != 0);
 

so we do consider any of them being 0 a fault given the assert(), fine, but yet
we do check for this again inside the loop:

+if (physspec != 0 && (fetchprop(cur, DTD_ATTR_PHYSICAL,
+physical, sizeof (physical)) != Z_OK ||
+strcmp(tabptr->zone_nwif_physical, physical) != 0))
+continue;
+if (addrspec != 0 && (fetchprop(cur, DTD_ATTR_ADDRESS, address,
+sizeof (address)) != Z_OK ||
+!zonecfg_same_net_address(tabptr->zone_nwif_address,
+address)))
+continue;
+if (defrouterspec != 0 && (fetchprop(cur, DTD_ATTR_DEFROUTER,
+address, sizeof (address)) != Z_OK ||
+!zonecfg_same_net_address(tabptr->zone_nwif_defrouter,
+address)))
+continue;

a good argument could probably be made to turn this assert into a real
check and return Z_INVAL for any of those 3 being 0 and get rid of
the checks inside the xml parsing loop ?

cheers
frankB

___
zones-discuss mailing list
zones-discuss@opensolaris.org