Re: [zfs-discuss] Re: Production ZFS Server Death (06/06)

2006-12-07 Thread Jeremy Teo

The whole raid does not fail -- we are talking about corruption
here.  If you lose some inodes your whole partition is not gone.

My ZFS pool would not salvage -- poof, whole thing was gone (granted
it was a test one and not a raidz or mirror yet).  But still, for
what happened, I cannot believe that 20G of data got messed up
because a 1GB cache was not correctly flushed.


Chad, I think what you're saying is for a zpool to allow you to
salvage whatever remaining data that passes it's checksums.

--
Regards,
Jeremy
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Limitations of ZFS

2006-12-07 Thread dudekula mastan
Hi Folks,
   
  Man pages of ZFS and ZPOOL, clearly saying that it is not good (recommended) 
to use some portion of device for ZFS file system creation.
   
  Hardly what are the problems if we use only some portion of disk space for 
ZFS FS ?
   
or 
   
  Why i can't use one partition of device for ZFS file ststem and another 
partition of for some other purpose ?
   
  Will it cause any problems if i use one partition of device for ZFS and 
another partition for some other purpose ?
   
  Why all people are strongly recommending to use whole disk (not part of disk) 
for creation zpools / ZFS file system ?
   
  Your help is appreciated.
   
   
  Thanks  Regards
  Masthan

 
-
Want to start your own business? Learn how on Yahoo! Small Business.___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] A Plea for Help: Thumper/ZFS/NFS/B43

2006-12-07 Thread Ben Rockwood
I've got a Thumper doing nothing but serving NFS.  Its using B43 with 
zil_disabled.  The system is being consumed in waves, but by what I don't know. 
 Notice vmstat:

 3 0 0 25693580 2586268 0 0  0  0  0  0  0  0  0  0  0  926   91  703  0 25 75
 21 0 0 25693580 2586268 0 0 0  0  0  0  0  0  0 13 14 1720   21 1105  0 92  8
 20 0 0 25693580 2586268 0 0 0  0  0  0  0  0  0 17 18 2538   70  834  0 100 0
 25 0 0 25693580 2586268 0 0 0  0  0  0  0  0  0  0  0  745   18  179  0 100 0
 37 0 0 25693552 2586240 0 0 0  0  0  0  0  0  0  7  7 1152   52  313  0 100 0
 16 0 0 25693592 2586280 0 0 0  0  0  0  0  0  0 15 13 1543   52  767  0 100 0
 17 0 0 25693592 2586280 0 0 0  0  0  0  0  0  0  2  2  890   72  192  0 100 0
 27 0 0 25693572 2586260 0 0 0  0  0  0  0  0  0 15 15 3271   19 3103  0 98  2
 0 0 0 25693456 2586144 0 11 0  0  0  0  0  0  0 281 249 34335 242 37289 0 46 54
 0 0 0 25693448 2586136 0 2  0  0  0  0  0  0  0  0  0 2470  103 2900  0 27 73
 0 0 0 25693448 2586136 0 0  0  0  0  0  0  0  0  0  0 1062  105  822  0 26 74
 0 0 0 25693448 2586136 0 0  0  0  0  0  0  0  0  0  0 1076   91  857  0 25 75
 0 0 0 25693448 2586136 0 0  0  0  0  0  0  0  0  0  0  917  126  674  0 25 75

These spikes of sys load come in waves like this.  While there are close to a 
hundred systems mounting NFS shares on the Thumper, the amount of traffic is 
really low.  Nothing to justify this.  We're talking less than 10MB/s.

NFS is pathetically slow.  We're using NFSv3 TCP shared via ZFS sharenfs on a 
3Gbps aggregation (3*1Gbps).

I've been slamming my head against this problem for days and can't make 
headway.  I'll post some of my notes below.  Any thoughts or ideas are welcome!

benr.

===

Step 1 was to disable any ZFS features that might consume large amounts of CPU:

# zfs set compression=off joyous
# zfs set atime=off joyous
# zfs set checksum=off joyous

These changes had no effect.

Next was to consider that perhaps NFS was doing name lookups when it shouldn't. 
Indeed dns was specified in /etc/nsswitch.conf which won't work given that no 
DNS servers are accessable from the storage or private networks, but again, no 
improvement. In this process I removed dns from nsswitch.conf, deleted 
/etc/resolv.conf, and disabled the dns/client service in SMF.

Turning back to CPU usage, we can see the activity is all SYStem time and comes 
in waves:

[private:/tmp] root# sar 1 100

SunOS private.thumper1 5.11 snv_43 i86pc12/07/2006

10:38:05%usr%sys%wio   %idle
10:38:06   0  27   0  73
10:38:07   0  27   0  73
10:38:09   0  27   0  73
10:38:10   1  26   0  73
10:38:11   0  26   0  74
10:38:12   0  26   0  74
10:38:13   0  24   0  76
10:38:14   0   6   0  94
10:38:15   0   7   0  93
10:38:22   0  99   0   1  --
10:38:23   0  94   0   6  --
10:38:24   0  28   0  72
10:38:25   0  27   0  73
10:38:26   0  27   0  73
10:38:27   0  27   0  73
10:38:28   0  27   0  73
10:38:29   1  30   0  69
10:38:30   0  27   0  73

And so we consider whether or not there is a pattern to the frequency. The 
following is sar output from any lines in which sys is above 90%:

10:40:04%usr%sys%wio   %idleDelta
10:40:11   0  97   0   3
10:40:45   0  98   0   2   34 seconds
10:41:02   0  94   0   6   17 seconds
10:41:26   0 100   0   0   24 seconds
10:42:00   0 100   0   0   34 seconds
10:42:25   (end of sample) 25 seconds

Looking at the congestion in the run queue:

[private:/tmp] root# sar -q 5 100

10:45:43 runq-sz %runocc swpq-sz %swpocc
10:45:5127.0  85 0.0   0
10:45:57 1.0  20 0.0   0
10:46:02 2.0  60 0.0   0
10:46:1319.8  99 0.0   0
10:46:2317.7  99 0.0   0
10:46:3424.4  99 0.0   0
10:46:4122.1  97 0.0   0
10:46:4813.0  96 0.0   0
10:46:5525.3 102 0.0   0

Looking at the per-CPU breakdown:

CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  00   00   324  224000  1540 00 100   0   0
  10   00   1140  2260   10   130860   1   0  99
  20   00   162  138  1490540 00   1   0  99
  30   00556   460430 00   1   0  99
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  00   00   310  210   340   17  1717 50 100   0   0
  10   00   1521  2000   17   265591  65   0  34
  20   00   271  197  1751   13   202 00  66   0  34
  30   00   120

Re: [zfs-discuss] Limitations of ZFS

2006-12-07 Thread Tomas Ögren
On 07 December, 2006 - dudekula mastan sent me these 2,9K bytes:

 Hi Folks,

   Man pages of ZFS and ZPOOL, clearly saying that it is not good
   (recommended) to use some portion of device for ZFS file system
   creation.

   Hardly what are the problems if we use only some portion of disk
   space for ZFS FS ?

 or 

   Why i can't use one partition of device for ZFS file ststem and
   another partition of for some other purpose ?

You can.

   Will it cause any problems if i use one partition of device for ZFS
   and another partition for some other purpose ?

No.

   Why all people are strongly recommending to use whole disk (not part
   of disk) for creation zpools / ZFS file system ?

One thing is performance; ZFS can enable/disable write cache in the disk
at will if it has full control over the entire disk..

/Tomas
-- 
Tomas Ögren, [EMAIL PROTECTED], http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A Plea for Help: Thumper/ZFS/NFS/B43

2006-12-07 Thread Jim Mauro


Hey Ben - I need more time to look at this and connect some dots,
but real quick

Some nfsstat data that we could use to potentially correlate to the local
server activity would be interesting. zfs_create() seems to be the
heavy hitter, but a periodic kernel profile (especially if we can catch
a 97% SYS period) would help:

#lockstat -i997 -Ik -s 10 sleep 60

Alternatively:

#dtrace -n 'profile-997hz / arg0 != 0 / { @s[stack()]=count(); }'

It would also be interesting to see what the zfs_create()'s are doing.
Perhaps a quick:

#dtrace -n 'zfs_create:entry { printf(ZFS Create: %s\n, 
stringof(args[0]-v_path)); }'


It would also be interesting to see the network stats. Grab Brendan's 
nicstat

and collect some samples

You're reference to low traffic is in bandwidth, which, as you indicate, 
is really,
really low. But the data, at least up to this point, suggests the 
workload is not
data/bandwidth intensive, but more attribute intensive. Note again 
zfs_create()
is the heavy ZFS function, along with zfs_getattr. Perhaps it's the 
attribute-intensive

nature of the load that is at the root of this.

I can spend more time on this tomorrow (traveling today).

Thanks,
/jim


Ben Rockwood wrote:

I've got a Thumper doing nothing but serving NFS.  Its using B43 with 
zil_disabled.  The system is being consumed in waves, but by what I don't know. 
 Notice vmstat:

 3 0 0 25693580 2586268 0 0  0  0  0  0  0  0  0  0  0  926   91  703  0 25 75
 21 0 0 25693580 2586268 0 0 0  0  0  0  0  0  0 13 14 1720   21 1105  0 92  8
 20 0 0 25693580 2586268 0 0 0  0  0  0  0  0  0 17 18 2538   70  834  0 100 0
 25 0 0 25693580 2586268 0 0 0  0  0  0  0  0  0  0  0  745   18  179  0 100 0
 37 0 0 25693552 2586240 0 0 0  0  0  0  0  0  0  7  7 1152   52  313  0 100 0
 16 0 0 25693592 2586280 0 0 0  0  0  0  0  0  0 15 13 1543   52  767  0 100 0
 17 0 0 25693592 2586280 0 0 0  0  0  0  0  0  0  2  2  890   72  192  0 100 0
 27 0 0 25693572 2586260 0 0 0  0  0  0  0  0  0 15 15 3271   19 3103  0 98  2
 0 0 0 25693456 2586144 0 11 0  0  0  0  0  0  0 281 249 34335 242 37289 0 46 54
 0 0 0 25693448 2586136 0 2  0  0  0  0  0  0  0  0  0 2470  103 2900  0 27 73
 0 0 0 25693448 2586136 0 0  0  0  0  0  0  0  0  0  0 1062  105  822  0 26 74
 0 0 0 25693448 2586136 0 0  0  0  0  0  0  0  0  0  0 1076   91  857  0 25 75
 0 0 0 25693448 2586136 0 0  0  0  0  0  0  0  0  0  0  917  126  674  0 25 75

These spikes of sys load come in waves like this.  While there are close to a 
hundred systems mounting NFS shares on the Thumper, the amount of traffic is 
really low.  Nothing to justify this.  We're talking less than 10MB/s.

NFS is pathetically slow.  We're using NFSv3 TCP shared via ZFS sharenfs on a 
3Gbps aggregation (3*1Gbps).

I've been slamming my head against this problem for days and can't make 
headway.  I'll post some of my notes below.  Any thoughts or ideas are welcome!

benr.

===

Step 1 was to disable any ZFS features that might consume large amounts of CPU:

# zfs set compression=off joyous
# zfs set atime=off joyous
# zfs set checksum=off joyous

These changes had no effect.

Next was to consider that perhaps NFS was doing name lookups when it shouldn't. Indeed 
dns was specified in /etc/nsswitch.conf which won't work given that no DNS 
servers are accessable from the storage or private networks, but again, no improvement. 
In this process I removed dns from nsswitch.conf, deleted /etc/resolv.conf, and disabled 
the dns/client service in SMF.

Turning back to CPU usage, we can see the activity is all SYStem time and comes 
in waves:

[private:/tmp] root# sar 1 100

SunOS private.thumper1 5.11 snv_43 i86pc12/07/2006

10:38:05%usr%sys%wio   %idle
10:38:06   0  27   0  73
10:38:07   0  27   0  73
10:38:09   0  27   0  73
10:38:10   1  26   0  73
10:38:11   0  26   0  74
10:38:12   0  26   0  74
10:38:13   0  24   0  76
10:38:14   0   6   0  94
10:38:15   0   7   0  93
10:38:22   0  99   0   1  --
10:38:23   0  94   0   6  --
10:38:24   0  28   0  72
10:38:25   0  27   0  73
10:38:26   0  27   0  73
10:38:27   0  27   0  73
10:38:28   0  27   0  73
10:38:29   1  30   0  69
10:38:30   0  27   0  73

And so we consider whether or not there is a pattern to the frequency. The 
following is sar output from any lines in which sys is above 90%:

10:40:04%usr%sys%wio   %idleDelta
10:40:11   0  97   0   3
10:40:45   0  98   0   2   34 seconds
10:41:02   0  94   0   6   17 seconds
10:41:26   0 100   0   0   24 seconds
10:42:00   0 100   0   0   34 seconds
10:42:25   (end of sample) 25 seconds

Looking at the 

[zfs-discuss] ZFS bootability target

2006-12-07 Thread Flemming Danielsen

Hi

I am about to plan an upgrade of about 500 systems (sparc) to Solaris 10 and
would like to go for ZFS to manage the rootdisk. But what timeframe are we
looking at? and what should we take into account to be able to migrate to it
later on?

--
// Flemming Danielsen
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Limitations of ZFS

2006-12-07 Thread Roch - PAE

 Why all people are strongly recommending to use whole disk (not part
 of disk) for creation zpools / ZFS file system ?

  One thing is performance; ZFS can enable/disable write cache in the disk
  at will if it has full control over the entire disk..

ZFS will also flush the WC when necessary, and if
applications are waiting for an I/O to complete, this is
typically a point where data must be flushed out. So I don't
expect much application performance gains here.

There  is  a  subset of  SATA   drives  that do   not handle
concurrent I/O requests and staging  I/Os through the  cache
can be a away to drive more data throughput in them. But for
many devices  the   write cache is   not  a big  performance
factor.

ZFS does  some intelligent   I/O scheduling  and   giving it
entire disks allows that code to be more effective. Building
pools  with   many slices  from   one disk, means  more head
movements and lost performance.  

-r

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] System pause peculiarity with mysql on zfs

2006-12-07 Thread Dale Ghent


Hey all, I run a netra X1 as the mysql db server for my small  
personal web site. This X1 has two drives in it with SVM-mirrored UFS  
slices for / and /var, a swap slice, and slice 7 is zfs. There is one  
zfs mirror pool called local on which there are a few file systems,  
one of which is for mysql. slice 7 used to be ufs, and I had no  
performance problems when that was the case. There is 1152MB of RAM  
on this box, half of which is in use. Solaris 10 FCS + all the latest  
patches as of today.


So anyway, after moving mysql to live on zfs (with compression turned  
on for the volume in question), I noticed that web pages on my site  
took a bit of time, sometimes up to 20 seconds to load. I'd jump on  
to my X1, and notice that according to top, kernel was hogging  
80-100% of the 500Mhz CPU, and mysqld was the top process in CPU use.  
The load average would shoot from a normal 0.something up to 6 or  
even 8. Command-line response was stop and go.


Then I'd notice my page would finally load, and that corresponded  
with load and kernel CPU usage decreasing back to normal levels.


I am able to reliably replicate this, and I ran lockstat while this  
was going on, the output of which is here:


http://elektronkind.org/osol/lockstat-zfs-0.txt

Part of me is kind of sure that this is 6421427 as there appears to  
be long and copious trips through ata_wait() as that bug illustrates,  
but I just want to be sure of it (and when is that bug seeing a  
solaris 10 patch, btw?)


TIA,
/dale
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS bootability target

2006-12-07 Thread Darren Dunham
 I am about to plan an upgrade of about 500 systems (sparc) to Solaris 10 and
 would like to go for ZFS to manage the rootdisk. But what timeframe are we
 looking at?

I've heard update 5, so several months at least.

 and what should we take into account to be able to migrate to it
 later on?

Are you isolating OS/root data from application/user data?  Do you have
2 dedicated disks to mirror root on?  If so, my guess is that later
upgrading will be easy.  Root ZFS pools will be restricted, so you won't
be sharing them with your non-root data in most cases.  That suggests to
me that I'd be able to break my existing SVM mirror and install a ZFS
root, later mirroring with the other disk.

-- 
Darren Dunham   [EMAIL PROTECTED]
Senior Technical Consultant TAOShttp://www.taos.com/
Got some Dr Pepper?   San Francisco, CA bay area
  This line left intentionally blank to confuse you. 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: A Plea for Help: Thumper/ZFS/NFS/B43

2006-12-07 Thread Alan Romeril
Hi Ben,
Your sar output shows one core pegged pretty much constantly!  From the Solaris 
Performance and Tools book that SLP state value has The remainder of important 
events such as disk and network waits. along with other kernel wait 
events..  kernel locks or condition variables also accumilate time in this 
state

ZFS COUNT
zfs_create 4178
ZFS AVG TIME
zfs_create 71215587
ZFS SUM TIME
zfs_create 297538724997

I think it looks like the system must be spinning in zfs_create(), looking in 
usr/src/uts/common/fs/zfs/zfs_vnops.c there are a couple of places it could 
loop:-

   1129 /*
   1130  * Create a new file object and update the directory
   1131  * to reference it.
   1132  */

   1154 error = dmu_tx_assign(tx, zfsvfs-z_assign);
   1155 if (error) {
   1156 zfs_dirent_unlock(dl);
   1157 if (error == ERESTART 
   1158 zfsvfs-z_assign == TXG_NOWAIT) {
   1159 dmu_tx_wait(tx);
   1160 dmu_tx_abort(tx);
   1161 goto top;
   1162 }

and

   1201 /*
   1202  * Truncate regular files if requested.
   1203  */
   1204 if ((ZTOV(zp)-v_type == VREG) 
   1205 (zp-z_phys-zp_size != 0) 
   1206 (vap-va_mask  AT_SIZE)  (vap-va_size == 0)) {
   1207 error = zfs_freesp(zp, 0, 0, mode, TRUE);
   1208 if (error == ERESTART 
   1209 zfsvfs-z_assign == TXG_NOWAIT) {
   1210 /* NB: we already did dmu_tx_wait() */
   1211 zfs_dirent_unlock(dl);
   1212 VN_RELE(ZTOV(zp));
   1213 goto top;

I think the snoop would be very useful to pour over.

Cheers,
Alan
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A Plea for Help: Thumper/ZFS/NFS/B43

2006-12-07 Thread Neil Perrin

Ben,

The attached dscript might help determining the zfs_create issue.
It prints:
- a count of all functions called from zfs_create
- average wall count time of the 30 highest functions
- average cpu time of the 30 highest functions

Note, please ignore warnings of the following type:

dtrace: 1346 dynamic variable drops with non-empty dirty list

Neil.

Ben Rockwood wrote On 12/07/06 06:01,:

I've got a Thumper doing nothing but serving NFS.  Its using B43 with 
zil_disabled.  The system is being consumed in waves, but by what I don't know. 
 Notice vmstat:

 3 0 0 25693580 2586268 0 0  0  0  0  0  0  0  0  0  0  926   91  703  0 25 75
 21 0 0 25693580 2586268 0 0 0  0  0  0  0  0  0 13 14 1720   21 1105  0 92  8
 20 0 0 25693580 2586268 0 0 0  0  0  0  0  0  0 17 18 2538   70  834  0 100 0
 25 0 0 25693580 2586268 0 0 0  0  0  0  0  0  0  0  0  745   18  179  0 100 0
 37 0 0 25693552 2586240 0 0 0  0  0  0  0  0  0  7  7 1152   52  313  0 100 0
 16 0 0 25693592 2586280 0 0 0  0  0  0  0  0  0 15 13 1543   52  767  0 100 0
 17 0 0 25693592 2586280 0 0 0  0  0  0  0  0  0  2  2  890   72  192  0 100 0
 27 0 0 25693572 2586260 0 0 0  0  0  0  0  0  0 15 15 3271   19 3103  0 98  2
 0 0 0 25693456 2586144 0 11 0  0  0  0  0  0  0 281 249 34335 242 37289 0 46 54
 0 0 0 25693448 2586136 0 2  0  0  0  0  0  0  0  0  0 2470  103 2900  0 27 73
 0 0 0 25693448 2586136 0 0  0  0  0  0  0  0  0  0  0 1062  105  822  0 26 74
 0 0 0 25693448 2586136 0 0  0  0  0  0  0  0  0  0  0 1076   91  857  0 25 75
 0 0 0 25693448 2586136 0 0  0  0  0  0  0  0  0  0  0  917  126  674  0 25 75

These spikes of sys load come in waves like this.  While there are close to a 
hundred systems mounting NFS shares on the Thumper, the amount of traffic is 
really low.  Nothing to justify this.  We're talking less than 10MB/s.

NFS is pathetically slow.  We're using NFSv3 TCP shared via ZFS sharenfs on a 
3Gbps aggregation (3*1Gbps).

I've been slamming my head against this problem for days and can't make 
headway.  I'll post some of my notes below.  Any thoughts or ideas are welcome!

benr.

===

Step 1 was to disable any ZFS features that might consume large amounts of CPU:

# zfs set compression=off joyous
# zfs set atime=off joyous
# zfs set checksum=off joyous

These changes had no effect.

Next was to consider that perhaps NFS was doing name lookups when it shouldn't. Indeed 
dns was specified in /etc/nsswitch.conf which won't work given that no DNS 
servers are accessable from the storage or private networks, but again, no improvement. 
In this process I removed dns from nsswitch.conf, deleted /etc/resolv.conf, and disabled 
the dns/client service in SMF.

Turning back to CPU usage, we can see the activity is all SYStem time and comes 
in waves:

[private:/tmp] root# sar 1 100

SunOS private.thumper1 5.11 snv_43 i86pc12/07/2006

10:38:05%usr%sys%wio   %idle
10:38:06   0  27   0  73
10:38:07   0  27   0  73
10:38:09   0  27   0  73
10:38:10   1  26   0  73
10:38:11   0  26   0  74
10:38:12   0  26   0  74
10:38:13   0  24   0  76
10:38:14   0   6   0  94
10:38:15   0   7   0  93
10:38:22   0  99   0   1  --
10:38:23   0  94   0   6  --
10:38:24   0  28   0  72
10:38:25   0  27   0  73
10:38:26   0  27   0  73
10:38:27   0  27   0  73
10:38:28   0  27   0  73
10:38:29   1  30   0  69
10:38:30   0  27   0  73

And so we consider whether or not there is a pattern to the frequency. The 
following is sar output from any lines in which sys is above 90%:

10:40:04%usr%sys%wio   %idleDelta
10:40:11   0  97   0   3
10:40:45   0  98   0   2   34 seconds
10:41:02   0  94   0   6   17 seconds
10:41:26   0 100   0   0   24 seconds
10:42:00   0 100   0   0   34 seconds
10:42:25   (end of sample) 25 seconds

Looking at the congestion in the run queue:

[private:/tmp] root# sar -q 5 100

10:45:43 runq-sz %runocc swpq-sz %swpocc
10:45:5127.0  85 0.0   0
10:45:57 1.0  20 0.0   0
10:46:02 2.0  60 0.0   0
10:46:1319.8  99 0.0   0
10:46:2317.7  99 0.0   0
10:46:3424.4  99 0.0   0
10:46:4122.1  97 0.0   0
10:46:4813.0  96 0.0   0
10:46:5525.3 102 0.0   0

Looking at the per-CPU breakdown:

CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  00   00   324  224000  1540 00 100   0   0
  10   00   1140  2260   10   130860   1   0  99
  20   00   162  138  1490540 00  

Re: [zfs-discuss] System pause peculiarity with mysql on zfs

2006-12-07 Thread Jason J. W. Williams

Hi Dale,

Are you using MyISAM or InnoDB? Also, what's your zpool configuration?

Best Regards,
Jason

On 12/7/06, Dale Ghent [EMAIL PROTECTED] wrote:


Hey all, I run a netra X1 as the mysql db server for my small
personal web site. This X1 has two drives in it with SVM-mirrored UFS
slices for / and /var, a swap slice, and slice 7 is zfs. There is one
zfs mirror pool called local on which there are a few file systems,
one of which is for mysql. slice 7 used to be ufs, and I had no
performance problems when that was the case. There is 1152MB of RAM
on this box, half of which is in use. Solaris 10 FCS + all the latest
patches as of today.

So anyway, after moving mysql to live on zfs (with compression turned
on for the volume in question), I noticed that web pages on my site
took a bit of time, sometimes up to 20 seconds to load. I'd jump on
to my X1, and notice that according to top, kernel was hogging
80-100% of the 500Mhz CPU, and mysqld was the top process in CPU use.
The load average would shoot from a normal 0.something up to 6 or
even 8. Command-line response was stop and go.

Then I'd notice my page would finally load, and that corresponded
with load and kernel CPU usage decreasing back to normal levels.

I am able to reliably replicate this, and I ran lockstat while this
was going on, the output of which is here:

 http://elektronkind.org/osol/lockstat-zfs-0.txt

Part of me is kind of sure that this is 6421427 as there appears to
be long and copious trips through ata_wait() as that bug illustrates,
but I just want to be sure of it (and when is that bug seeing a
solaris 10 patch, btw?)

TIA,
/dale
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS failover without multipathing

2006-12-07 Thread Richard Elling

Luke Schwab wrote:


Hi,

I am running Solaris 10 ZFS and I do not have STMS multipathing enables. I have dual FC connections to storage using two ports on an Emulex HBA. 

In the Solaris ZFS admin guide. It says that a ZFS file system monitors disks by their path and their device ID. If a disk is switched between controllers, ZFS will be able to pick up the disk on a secondary controller. 

I tested this theory by creating a zpool on the first controller and then I pulled the cable on the back of the server. the server took about 3-5 minutes to failover. But it did fail over!! 

 



By default, the [s]sd driver will retry [3]5 times with a timeout
of 60 seconds.  STMS understands the lower-level FC stuff, and
can make better decisions, faster.
-- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS compression / ARC interaction

2006-12-07 Thread Andrew Miller
Quick question about the interaction of ZFS filesystem compression and the 
filesystem cache.  We have an Opensolaris (actually Nexenta alpha-6) box 
running RRD collection.   These files seem to be quite compressible.  A test 
filesystem containing about 3,000 of these files shows a compressratio of 
12.5x.  

My question is about how the filesystem cache works with compressed files.  
Does the fscache keep a copy of the compressed data, or the uncompressed 
blocks?   To update one of these RRD files, I believe the whole contents are 
read into memory, modified, and then written back out.   If the filesystem 
cache maintained a copy of the compressed data, a lot more, maybe more than 10x 
more, of these files could be maintained in the cache.  That would mean we 
could have a lot more data files without ever needing to do a physical read.

Looking at the source code overview, it looks like the compression happens 
underneath the ARC layer, so by that I am assuming the uncompressed blocks 
are cached, but I wanted to ask to be sure.

Thanks!
-Andy
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: zpool import takes to long with large numbers of file systems

2006-12-07 Thread Jason J. W. Williams

Hi Luke,

That's terrific!

You know you might be able to tell ZFS which disks to look at. I'm not
sure. It would be interesting, if anyone with a Thumper could comment
on whether or not they see the import time issue. What are your load
times now with MPXIO?

Best Regards,
Jason



On 12/7/06, Luke Schwab [EMAIL PROTECTED] wrote:

Jason,

Sorry, I don't have IM. I did make some progress on
testing.

The solaris 10 OS allows you to kind of lun mask
within the fp.conf file by creating a list of luns you
don't want to see. This has improved my time greatly.
I can now create/export within a second and import
only takes about 10 seconds. What a differnce compared
to the 5-8 minutes I've been seeing! but this not good
if my machine needs to see LOTs of luns.

It would be nice if there was a feature in zfs where
you could specify which disk the the pool resides
instead of zfs looking through every disk attached to
the machine.

Luke Schwab


--- Jason J. W. Williams [EMAIL PROTECTED]
wrote:

 Hey Luke,

 Do you have IM?

 My Yahoo IM ID is [EMAIL PROTECTED]
 -J

 On 12/6/06, Luke Schwab [EMAIL PROTECTED]
 wrote:
  Rats, I think I know where your going? We use LSIs
  exclusively.
 
  LSI performs lun masking as the driver level. You
 can
  specifically tell the LSI HBA to only bind to
 specific
  luns on the array. The array doesn't appear to
 support
  lun masking by itself.
 
  I believe you can also mask at the SAN switch but
 most
  of our connections are direct connect to the
 array.
 
  I just thought you know a quick and easy way to
 mask
  in Solaris 10. I tried the fp.conf file with a
  blackout list to prevent certain luns from being
  viewed but I couldn't get the OS to have a list of
  luns that it can only allow.
 
  Thanks,
  Luke
 
 
  --- Jason J. W. Williams
 [EMAIL PROTECTED]
  wrote:
 
   Hi Luke,
  
   Who makes your array? IBM, SGI or StorageTek?
  
   Best Regards,
   Jason
  
   On 12/6/06, Luke Schwab [EMAIL PROTECTED]
   wrote:
Jason,
   
could you give me a tip on how to do lun
 masking.
   I
used to do it via the /kernel/drv/ssd.conf
 file
   with
an LSI HBA. Now I have Emulex HBAs with the
   Leadville.
   
   
I saw on sunsolve a way to mask using a black
   list
in the /kernel/drv/fp.conf file but that isn't
   what I
was looking for.
   
Do you know any other ways?
   
Thanks,
   
Luke Scwhab
--- Jason J. W. Williams
   [EMAIL PROTECTED]
wrote:
   
 Hi Luke,

 I think you'll really like it. We moved from
   UFS/SVM
 and its a night
 and day management difference. Though I
   understand
 SVM itself is
 easier to deal with than VxVM, so it may be
 an
   order
 of magnitude
 easier.

 Best Regards,
 Jason

 On 12/6/06, Luke Schwab
 [EMAIL PROTECTED]
 wrote:
  The 4884 as well as the V280 server is
 using 2
 ports
  each. I don't have any FS's at this point.
 I'm
 trying
  to keep it simple for now.
 
  We are beta testing to go away from VxVM
 and
   such.
 
 
 
  --- Jason J. W. Williams
 [EMAIL PROTECTED]
  wrote:
 
   Hi Luke,
  
   Is the 4884 using two or four ports?
 Also,
   how
 many
   FSs are involved?
  
   Best Regards,
   Jason
  
   On 12/6/06, Luke Schwab
   [EMAIL PROTECTED]
   wrote:
I, too, experienced a long delay while
 importing a
   zpool on a second machine. I do not have
 any
   filesystems in the pool. Just the
 Solaris 10
   Operating system, Emulex 10002DC HBA,
 and a
   4884
 LSI
   array (dual attached).
   
I don't have any file systems created
 but
   when
   STMS(mpxio) is enabled I see
   
# time zpool import testpool
real 6m41.01s
user 0m.30s
sys 0m0.14s
   
When I disable STMS(mpxio), the times
 are
   much
   better but still not that great?
   
# time zpool import testpool
real 1m15.01s
user 0m.15s
sys 0m0.35s
   
Are these normal symproms??
   
Can anyone explain why I too see
 delays
   even
   though I don't have any file systems in
 the
 zpool?
   
   
This message posted from
 opensolaris.org
   

 ___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
   
  
 

   
  
 

http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
   
  
 
 
 
  
 __
  Do You Yahoo!?
  Tired of spam?  Yahoo! Mail has the best
 spam
 protection around
  http://mail.yahoo.com
 

   
   

=== message truncated ===





Have a burning question?
Go to www.Answers.yahoo.com and get answers from real people who know.



Re: [zfs-discuss] System pause peculiarity with mysql on zfs

2006-12-07 Thread Nicholas Senedzuk

You said you are running Solaris 10 FCS but zfs was not released until
Solaris 10 6/06 which is Solaris 10U2.

On 12/7/06, Jason J. W. Williams [EMAIL PROTECTED] wrote:


Hi Dale,

Are you using MyISAM or InnoDB? Also, what's your zpool configuration?

Best Regards,
Jason

On 12/7/06, Dale Ghent [EMAIL PROTECTED] wrote:

 Hey all, I run a netra X1 as the mysql db server for my small
 personal web site. This X1 has two drives in it with SVM-mirrored UFS
 slices for / and /var, a swap slice, and slice 7 is zfs. There is one
 zfs mirror pool called local on which there are a few file systems,
 one of which is for mysql. slice 7 used to be ufs, and I had no
 performance problems when that was the case. There is 1152MB of RAM
 on this box, half of which is in use. Solaris 10 FCS + all the latest
 patches as of today.

 So anyway, after moving mysql to live on zfs (with compression turned
 on for the volume in question), I noticed that web pages on my site
 took a bit of time, sometimes up to 20 seconds to load. I'd jump on
 to my X1, and notice that according to top, kernel was hogging
 80-100% of the 500Mhz CPU, and mysqld was the top process in CPU use.
 The load average would shoot from a normal 0.something up to 6 or
 even 8. Command-line response was stop and go.

 Then I'd notice my page would finally load, and that corresponded
 with load and kernel CPU usage decreasing back to normal levels.

 I am able to reliably replicate this, and I ran lockstat while this
 was going on, the output of which is here:

  http://elektronkind.org/osol/lockstat-zfs-0.txt

 Part of me is kind of sure that this is 6421427 as there appears to
 be long and copious trips through ata_wait() as that bug illustrates,
 but I just want to be sure of it (and when is that bug seeing a
 solaris 10 patch, btw?)

 TIA,
 /dale
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS compression / ARC interaction

2006-12-07 Thread Mark Maybee

Andrew Miller wrote:
Quick question about the interaction of ZFS filesystem compression and the filesystem cache.  We have an Opensolaris (actually Nexenta alpha-6) box running RRD collection.   These files seem to be quite compressible.  A test filesystem containing about 3,000 of these files shows a compressratio of 12.5x.  


My question is about how the filesystem cache works with compressed files.  
Does the fscache keep a copy of the compressed data, or the uncompressed 
blocks?   To update one of these RRD files, I believe the whole contents are 
read into memory, modified, and then written back out.   If the filesystem 
cache maintained a copy of the compressed data, a lot more, maybe more than 10x 
more, of these files could be maintained in the cache.  That would mean we 
could have a lot more data files without ever needing to do a physical read.

Looking at the source code overview, it looks like the compression happens 
underneath the ARC layer, so by that I am assuming the uncompressed blocks 
are cached, but I wanted to ask to be sure.

Thanks!
-Andy
 

Yup, your assumption is correct.  We currently do compression below the
ARC.  We have contemplated caching data in compressed form, but have not
really explored the idea fully yet.

-Mark
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS compression / ARC interaction

2006-12-07 Thread Wee Yeh Tan

On 12/8/06, Mark Maybee [EMAIL PROTECTED] wrote:

Yup, your assumption is correct.  We currently do compression below the
ARC.  We have contemplated caching data in compressed form, but have not
really explored the idea fully yet.


Hmm... interesting idea.

That will incur CPU to do a decompress when the page is reclaimed but
reduce memory pressure.  What implications this will have on
encryption?


--
Just me,
Wire ...
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: ZFS compression / ARC interaction

2006-12-07 Thread Andrew Miller
 Looking at the source code overview, it looks like
 the compression happens underneath the ARC layer,
 so by that I am assuming the uncompressed blocks are
  cached, but I wanted to ask to be sure.
  
  Thanks!
  -Andy
   
 Yup, your assumption is correct.  We currently do
 compression below the
 ARC.  We have contemplated caching data in compressed
 form, but have not
 really explored the idea fully yet.
 
 -Mark
 ___

Mark,

Thanks for the quick response!I imagine the compression will still help 
quite a bit anyway, since ultimately there's a lot less data to write back to 
the disk.   A compressed cache would be an interesting tuneable parameter - it 
would be great for these types of files, and also for some of the things we 
keep here in databases.  (A lot of text/blobs and as such highly compressible). 
 

My colleagues and I are really impressed with the design and performance of ZFS 
- Keep up the good work!

-Andy
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: A Plea for Help: Thumper/ZFS/NFS/B43

2006-12-07 Thread Anton B. Rang
 I'm still confused though, I believe that locking an adaptive mutex will spin 
 for a short
 period then context switch and so they shouldn't be burning CPU - at least 
 not .4s worth!

An adaptive mutex will spin as long as the thread which holds the mutex is on 
CPU.  If the lock is moderately contended, you can wind up with threads 
spinning for quite a while as ownership of the lock passes from thread to 
thread across CPUs.

Mutexes in Solaris tend to be most useful when they're held for very short 
periods of time; they also work pretty well if the owning thread blocks. If 
somebody is computing for quite a while while holding them (e.g. if 
dnode_next_offset is repeatedly called and is slow), they can waste a lot of 
time on other CPUs. In this case an rwlock usually works better.

Anton
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] System pause peculiarity with mysql on zfs

2006-12-07 Thread Dale Ghent

On Dec 7, 2006, at 1:46 PM, Jason J. W. Williams wrote:


Hi Dale,

Are you using MyISAM or InnoDB?


InnoDB.


Also, what's your zpool configuration?


A basic mirror:

[EMAIL PROTECTED]zpool status
  pool: local
state: ONLINE
scrub: none requested
config:

NAME  STATE READ WRITE CKSUM
local ONLINE   0 0 0
  mirror  ONLINE   0 0 0
c0t0d0s7  ONLINE   0 0 0
c0t2d0s7  ONLINE   0 0 0

errors: No known data errors
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: System pause peculiarity with mysql on zfs

2006-12-07 Thread Anton B. Rang
This does look like the ATA driver bug rather than a ZFS issue per se.

(For the curious, the reason ZFS triggers this when UFS doesn't is because ZFS 
sends a synchronize cache command to the disk, which is not handled in DMA mode 
by the controller; and for this particular controller, switching between DMA 
and PIO mode has some quirks which were worked around by adding delays. The fix 
involves a new quirk-work-around.)

Anton
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS compression / ARC interaction

2006-12-07 Thread Mike Gerdts

On 12/7/06, Andrew Miller [EMAIL PROTECTED] wrote:

Quick question about the interaction of ZFS filesystem compression and the 
filesystem cache.  We have an Opensolaris (actually Nexenta alpha-6) box 
running RRD collection.   These files seem to be quite compressible.  A test 
filesystem containing about 3,000 of these files shows a compressratio of 12.5x.


Be careful here.  If you are using files that have no data in them yet
you will get much better compression than later in life.  Judging by
the fact that you got only 12.5x, I suspect that your files are at
least partially populated.  Expect the compression to get worse over
time.

Looking at some RRD files that come from a very active (e.g. numbers
vary frequently) servers with data filling about 2/3 of the configured
time periods, I see the following rates:

1.8 mpstat.rrd
1.8 vmstat.rrd
1.9 exacct_PROJECT_user.oracle.rrd
2.0 net-ce2.rrd
2.1 iostat-c14.rrd
2.1 iostat-c15.rrd
2.1 iostat-c16.rrd
. . .
7.6 net-ce912005.rrd
7.7 net-ce912016.rrd
9.1 exacct_PROJECT_user.gemsadm.rrd
12.2 exacct_PROJECT_exacct_interval.rrd
18.1 exacct_PROJECT_user.patrol.rrd
18.1 exacct_PROJECT_user.precise.rrd
18.1 exacct_PROJECT_user.precise6.rrd
31.8 net-ce8.rrd
39.6 net-eri3.rrd
45.1 net-eri2.rrd

The first column is the compression ratio.  The net-eri{2,3} files are
almost empty.



My question is about how the filesystem cache works with compressed files.  
Does the fscache keep a copy of the compressed data, or the uncompressed 
blocks?   To update one of these RRD files, I believe the whole contents are 
read into memory, modified, and then written back out.   If the filesystem 
cache maintained a copy of the compressed data, a lot more, maybe more than 10x 
more, of these files could be maintained in the cache.  That would mean we 
could have a lot more data files without ever needing to do a physical read.


Here is an insert of a value:

25450:  open(/opt/perfstat/rrd/somehost/iostat-c4.rrd, O_RDWR) = 3
25450:  fstat64(3, 0xFFBFF5E0)  = 0
25450:  fstat64(3, 0xFFBFF640)  = 0
25450:  fstat64(3, 0xFFBFF4E8)  = 0
25450:  ioctl(3, TCGETA, 0xFFBFF5CC)Err#25 ENOTTY
25450:  read(3,  R R D\0 0 0 0 1\0\0\0\0.., 8192) = 8192
25450:  llseek(3, 0, SEEK_CUR)  = 8192
25450:  lseek(3, 0xFC68, SEEK_CUR)  = 7272
25450:  fcntl(3, F_SETLK, 0xFFBFF7D0)   = 0
25450:  llseek(3, 0, SEEK_CUR)  = 7272
25450:  lseek(3, 2230952, SEEK_SET) = 2230952
25450:  write(3,  @ x S = pA3D7\v ?E6 f f.., 64)  = 64
25450:  lseek(3, 1864, SEEK_SET)= 1864
25450:  write(3,  E xA0 # U N K N\0\0\0\0.., 5408)= 5408
25450:  close(3)= 0

Notice that it does the following:

Open the file
Read the first 8K
Seek to a particular spot
Take a lock
Seek
Write 64 bytes
seek
Write 5408 bytes
close

The rrd file in question is 8.6 MB.  There was 8KB of reads and 5472
bytes of write.  This is one of the big wins over the current binary
rrd format over the original ASCII version that came with MRTG.

Mike

--
Mike Gerdts
http://mgerdts.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: System pause peculiarity with mysql on zfs

2006-12-07 Thread Jason J. W. Williams

That's gotta be what it is. All our MySQL IOP issues have gone away
one we moved to RAID-1 from RAID-Z.

-J

On 12/7/06, Anton B. Rang [EMAIL PROTECTED] wrote:

This does look like the ATA driver bug rather than a ZFS issue per se.

(For the curious, the reason ZFS triggers this when UFS doesn't is because ZFS 
sends a synchronize cache command to the disk, which is not handled in DMA mode 
by the controller; and for this particular controller, switching between DMA 
and PIO mode has some quirks which were worked around by adding delays. The fix 
involves a new quirk-work-around.)

Anton


This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] System pause peculiarity with mysql on zfs

2006-12-07 Thread Dale Ghent

On Dec 7, 2006, at 5:22 PM, Nicholas Senedzuk wrote:

You said you are running Solaris 10 FCS but zfs was not released  
until Solaris 10 6/06 which is Solaris 10U2.


Look at a Solaris 10 6/06 CD/DVD. Check out the Solaris_10/ 
UpgradePatches directory.


ah! well whaddya know...

Yes, apply those (you have to do them in the right order to do it in  
one run with 'patchadd -M') and you can bring your older box up to  
date with the update release.


/dale

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: System pause peculiarity with mysql on zfs

2006-12-07 Thread Dale Ghent

On Dec 7, 2006, at 6:14 PM, Anton B. Rang wrote:


This does look like the ATA driver bug rather than a ZFS issue per se.


Yes indeed. Well, that answers that. FWIW, I'm hour 2 of a mysql  
configure script run. Yow!


(For the curious, the reason ZFS triggers this when UFS doesn't is  
because ZFS sends a synchronize cache command to the disk, which is  
not handled in DMA mode by the controller; and for this particular  
controller, switching between DMA and PIO mode has some quirks  
which were worked around by adding delays. The fix involves a new  
quirk-work-around.)


Ah, so I suppose this would affect the V100, too. The same ALi IDE  
controller in that box.


Thanks for the insight. Since the fix for this made it into snv_52, I  
suppose it's too recent for a backport and patch release for s10 :(


/dale
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: ZFS compression / ARC interaction

2006-12-07 Thread Andrew Miller
 Be careful here.  If you are using files that have no
 data in them yet
 you will get much better compression than later in
 life.  Judging by
 the fact that you got only 12.5x, I suspect that your
 files are at
 least partially populated.  Expect the compression to
 get worse over
 time.

I do expect it to get somewhat worse over time -- I don't expect such 
compression forever but didn't want to get too detailed in my original 
question. :-)   A lot of the data points I'm collecting (40%+) are quite static 
or change slowly over time - representing, for example, disk space, TCP errors 
(hopefully always zero! :-)) or JVM jstats of development JVM instances that 
get only small occasional bursts in activity. 

(snip)
 Read the first 8K
 Seek to a particular spot
 Take a lock
 Seek
 Write 64 bytes
 seek
 Write 5408 bytes
 close

Interesting, that looks a lot different than what I'm seeing.  Maybe something 
different in the implementation (I'm using perl RRDs and RRD 1.2.11).  Note 
that RRDFILE.rrd is 125504 bytes on disk.  I'll have to look into it a little 
deeper as it certainly would help performance to just read the preamble and 
modify the pieces of the RRD that need to change.  Maybe it's because my RRD's 
are quite small, they contain only one DS and I tuned down the number and 
length of the RRA's while fighting the performance issues that ultimately ended 
in me moving this to an opensolaris based box.

open(RRDFILE.rrd, O_RDONLY) = 14
fstat64(14, 0x080473E0) = 0
fstat64(14, 0x08047310) = 0
ioctl(14, TCGETA, 0x080473AC)   Err#25 ENOTTY
read(14,  R R D\0 0 0 0 3\0\0\0\0.., 125952)  = 125504
llseek(14, 0, SEEK_CUR) = 125504
lseek(14, 21600, SEEK_SET)  = 21600
lseek(14, 1600, SEEK_SET)   = 1600
read(14, \0\0\0\0\0\0F8FF\0\0\0\0.., 125952)  = 123904
llseek(14, 0xFFFE24F8, SEEK_CUR)= 3896
close(14)   = 0
open(RRDFILE.rrd, O_RDWR) = 14
fstat64(14, 0x08047320) = 0
fstat64(14, 0x08047250) = 0
ioctl(14, TCGETA, 0x080472EC)   Err#25 ENOTTY
read(14,  R R D\0 0 0 0 3\0\0\0\0.., 125952)  = 125504
llseek(14, 0, SEEK_CUR) = 125504
lseek(14, 0, SEEK_END)  = 125504
llseek(14, 0, SEEK_CUR) = 125504
llseek(14, 0, SEEK_CUR) = 125504
lseek(14, 1504, SEEK_SET)   = 1504
fcntl(14, F_SETLK, 0x08047430)  = 0
mmap(0x, 125504, PROT_READ|PROT_WRITE, MAP_SHARED, 14, 0) = 0xFE25F000
munmap(0xFE25F000, 125504)  = 0
llseek(14, 0, SEEK_CUR) = 1504
lseek(14, 880, SEEK_SET)= 880
write(14,  :B0 x E\0\0\0\0 1 5\0\0.., 624)= 624
close(14)   = 0

(I did an strace on linux, also, which is using RRD 1.0.49, and it looks about 
the same - appears to read the whole thing.  Maybe it's something in RRDs or 
the way I'm using it)

Thanks for spending some of your time analyzing my problem.  :-)

-Andy
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: System pause peculiarity with mysql on zfs

2006-12-07 Thread Jason J. W. Williams

Hi Dale,

For what its worth, the SX releases tend to be pretty stable. I'm not
sure if snv_52 has made a SX release yet. We ran for over 6 months on
SX 10/05 (snv_23) with no downtime.

Best Regards,
Jason

On 12/7/06, Dale Ghent [EMAIL PROTECTED] wrote:

On Dec 7, 2006, at 6:14 PM, Anton B. Rang wrote:

 This does look like the ATA driver bug rather than a ZFS issue per se.

Yes indeed. Well, that answers that. FWIW, I'm hour 2 of a mysql
configure script run. Yow!

 (For the curious, the reason ZFS triggers this when UFS doesn't is
 because ZFS sends a synchronize cache command to the disk, which is
 not handled in DMA mode by the controller; and for this particular
 controller, switching between DMA and PIO mode has some quirks
 which were worked around by adding delays. The fix involves a new
 quirk-work-around.)

Ah, so I suppose this would affect the V100, too. The same ALi IDE
controller in that box.

Thanks for the insight. Since the fix for this made it into snv_52, I
suppose it's too recent for a backport and patch release for s10 :(

/dale
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: ZFS failover without multipathing

2006-12-07 Thread Luke Schwab
Jason,
 I am no longer looking at not using STMS multipathing because without STMS you 
loose the binding to the array and I loose all transmissions between the server 
and array. The binding does come back after a few minutes but this is not 
acceptable in our environment.   

Load times vary depending on my configuration.

Senario 1: No STMS: Really fast zpool create and zpool import/export. Less then 
1 second for create/export and 5-15 seconds for an import.

Senario 2:STMS(mpxio)enabled and no blacks being used to LUN masking: zpool 
create takes 5-15 seconds, zpool imports take from 5-7 minutes.  

Senario 3: STMS enabled and blacklists enabled via /kernel/drv/fp.conf: It look 
at least 15 minutes to do a zpool create before I finially stopped it. This 
does not appear to be a viable solution. 

If you have any ideas about how to improve performance I am all ears. I'm not 
sure why ZFS takes so long to create pools with STMS? 

Does anyone have problems using LSI arrays. I already had problems using my LSI 
HBA with ZFS because the LSI HBA does not work with the Leadville stack. 

R/ ljs


 Hi Luke,
 
 That's terrific!
 
 You know you might be able to tell ZFS which disks
 to look at. I'm not
 sure. It would be interesting, if anyone with a
 Thumper could comment
 on whether or not they see the import time issue.
 What are your load
 times now with MPXIO?
 
 Best Regards,
 Jason

 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS failover without multipathing

2006-12-07 Thread Jason J. W. Williams

Hi Luke,

I wonder if it is the HBA. We had issues with Solaris and LSI HBAs
back when we were using an Xserve RAID.

Haven't had any of the issues you're describing between our LSI array
and the Qlogic HBAs we're using now.

If you have another type of HBA I'd try it. MPXIO and ZFS haven't ever
caused what you're seeing for us.

-J

On 12/7/06, Luke Schwab [EMAIL PROTECTED] wrote:

Jason,
 I am no longer looking at not using STMS multipathing because without STMS you 
loose the binding to the array and I loose all transmissions between the server 
and array. The binding does come back after a few minutes but this is not 
acceptable in our environment.

Load times vary depending on my configuration.

Senario 1: No STMS: Really fast zpool create and zpool import/export. Less then 
1 second for create/export and 5-15 seconds for an import.

Senario 2:STMS(mpxio)enabled and no blacks being used to LUN masking: zpool 
create takes 5-15 seconds, zpool imports take from 5-7 minutes.

Senario 3: STMS enabled and blacklists enabled via /kernel/drv/fp.conf: It look at least 
15 minutes to do a zpool create before I finially stopped it. This does not 
appear to be a viable solution.

If you have any ideas about how to improve performance I am all ears. I'm not 
sure why ZFS takes so long to create pools with STMS?

Does anyone have problems using LSI arrays. I already had problems using my LSI 
HBA with ZFS because the LSI HBA does not work with the Leadville stack.

R/ ljs


 Hi Luke,

 That's terrific!

 You know you might be able to tell ZFS which disks
 to look at. I'm not
 sure. It would be interesting, if anyone with a
 Thumper could comment
 on whether or not they see the import time issue.
 What are your load
 times now with MPXIO?

 Best Regards,
 Jason



This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A Plea for Help: Thumper/ZFS/NFS/B43

2006-12-07 Thread eric kustarz

Ben Rockwood wrote:

Eric Kustarz wrote:


Ben Rockwood wrote:
 I've got a Thumper doing nothing but serving NFS.  Its using B43 with
 zil_disabled.  The system is being consumed in waves, but by what I
 don't know.  Notice vmstat:

We made several performance fixes in the NFS/ZFS area in recent 
builds, so if possible it would be great to upgrade you from snv_43.  
That said, there might be something else going on that we haven't 
accounted for.

...



Step 1 was to disable any ZFS features that might consume large 
amounts of CPU:


# zfs set compression=off joyous
# zfs set atime=off joyous
# zfs set checksum=off joyous



In our performance testing, we haven't found checksums to be anywhere 
near a large consumer of CPU, so i would recommend leaving that on 
(due to its benefits).


I suspect your apps/clients don't depend on atime, so i think its a 
good idea to turn that off.  We've gotten better NFS performance with 
this off.


More of a heads up as it sounds like compression on/off isn't your 
problem.  If you are not getting good I/O BW with compression turned 
on, its most likely due to:

6460622 zio_nowait() doesn't live up to its name

As Jim, mentioned, using lockstat to figure out where your CPU is 
being spent is the first step.  I've been using 'lockstat -kgIW -D 60 
sleep 60'.  That collects data for the top 60 callers for a 1 minute 
period. If you see 'mutex_enter' high up in the results, then we have 
at least mutex lock contention.



...


Interestingly, using prstat -mL to monitor thread latency, we see 
that a handful of threads are the culprates for consuming mass CPU:


[private:/tmp] root# prstat -mL
   PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG 
PROCESS/LWPID  22643 daemon   0.0  75 0.0 0.0 0.0 0.0  25 0.0 416   
1   0   0 nfsd/1506
 22643 daemon   0.0  75 0.0 0.0 0.0 0.0  25 0.0 415   0   0   0 
nfsd/1563
 22643 daemon   0.0  74 0.0 0.0 0.0 0.0  26 0.0 417   0   0   0 
nfsd/1554
 22643 daemon   0.0  74 0.0 0.0 0.0 0.0  26 0.0 419   0   0   0 
nfsd/1551
 22643 daemon   0.0 0.2 0.0 0.0 0.0 0.0  26  74 418   0   0   0 
nfsd/1553
 22643 daemon   0.0 0.2 0.0 0.0 0.0 0.0 100 0.0 417   0   0   0 
nfsd/1536
 22643 daemon   0.0 0.2 0.0 0.0 0.0 0.0 100 0.0 417   0   0   0 
nfsd/1555
 22643 daemon   0.0 0.2 0.0 0.0 0.0 0.0 100 0.0 418   0   0   0 
nfsd/1539
 22643 daemon   0.0 0.2 0.0 0.0 0.0 0.0 100 0.0 417   0   0   0 
nfsd/1562
 22643 daemon   0.0 0.2 0.0 0.0 0.0 0.0 100 0.0 418   0   0   0 
nfsd/1545
 22643 daemon   0.0 0.2 0.0 0.0 0.0 0.0 100 0.0 417   0   0   0 
nfsd/1559
 22643 daemon   0.0 0.2 0.0 0.0 0.0 0.0 100 0.0 419   1   0   0 
nfsd/1541
 22643 daemon   0.0 0.2 0.0 0.0 0.0 0.0 100 0.0 417   0   0   0 
nfsd/1546
 22643 daemon   0.0 0.2 0.0 0.0 0.0 0.0 100 0.0 417   0   0   0 
nfsd/1543
 22643 daemon   0.0 0.2 0.0 0.0 0.0 0.0 100 0.0 418   0   0   0 
nfsd/1560

Total: 33 processes, 218 lwps, load averages: 4.64, 6.20, 5.86



The high SYS times being charged to the userland nfsd threads is 
representative of what the kernel threads are doing (which is most 
likely going to be NFS and ZFS).



Running zvop_times.d 
(http://blogs.sun.com/erickustarz/resource/zvop_times.d) I get an 
idea of what ZFS is doing:


[private:/] root# dtrace -s  zvop_times.d
dtrace: script 'zvop_times.d' matched 66 probes
^C
CPU IDFUNCTION:NAME
  1  2 :END ZFS COUNT

  zfs_getsecattr4
  zfs_space70
  zfs_rename  111
  zfs_readdir 284
  zfs_read367
  zfs_mkdir   670
  zfs_setattr1054
  zfs_frlock 1562
  zfs_putpage3916
  zfs_write  4110
  zfs_create 4178
  zfs_remove 7794
  zfs_fid   14960
  zfs_inactive  17637
  zfs_access20809
  zfs_fsync 29668
  zfs_lookup31273
  zfs_getattr  175457



ZFS AVG TIME

  zfs_fsync  2337
  zfs_getattr2735
  zfs_access 2774
  zfs_fid2948
  zfs_inactive