[zfs-discuss] Help with the best layout

2008-02-15 Thread Kim Tingkær
Hi everybody,

thanks for at very good source of information! I hope maybe you guys can help 
out a little. 

I have 3 disk, one usb 300gb and 2x150gb ide. I would like to get the most 
space out of what ever configuration i apply. So i've been thinking (and 
testing without success), is it at all possible to stripe the two smaller disks 
and then mirroring that stripe with the lager one? I've tried all kinds of 
maneuvers, except destroying everything and start from scratch.

The thing is i've already got 20gb data on a mirror containing the two smaller 
disks. So i've sort of had to zigzag a little bit moving the data around. But i 
will start from scratch if needed.

Any ideas?

Thanks in advance.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Help with the best layout

2008-02-15 Thread Kim Tingkær
Using ZFS ofcourse *g*
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] iscsi connection aborted.

2008-02-15 Thread James Nord
Hi,

I'm trying to boot a HP dl360 G5 from via iSCSI from a solaris 10 u4 zfs device 
but it's failing the login at boot:

POST messages from the dl360:
  Starting iSCSI boot option rom initialization...
  Connecting.connected.
  Logging in...error - failing.

Interestingly (and correctly) the authentication method is set to none.

I've looked at a TCP capture and the box succeeds the first login but 
immediatly after the second login the sun box (iscsi target) closes the 
connection (tcp fin).

I've compared this to a sucessful session when running in windows and there are 
some minor differences to the key/value pairs (receive buffer length/CRC etc) 
and also the ISID is all zero at the boot session but has values when running 
from windows.  But I'm not sure if this is a problem or not.

I'm not sure what/where the problem is - all the values appear to be setup 
correctly (I'm new to iscsi) - after all the first login command succeeds.

Anyone any pointers as to what is the problem or where to look to diagnose this 
further (does solaris log this information anywhere)?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Which DTrace provider to use

2008-02-15 Thread Roch Bourbonnais


Le 14 févr. 08 à 02:22, Marion Hakanson a écrit :

 [EMAIL PROTECTED] said:
 It's not that old.  It's a Supermicro system with a 3ware 9650SE-8LP.
 Open-E iSCSI-R3 DOM module.  The system is plenty fast.  I can pretty
 handily pull 120MB/sec from it, and write at over 100MB/sec.  It  
 falls  apart
 more on random I/O.  The server/initiator side is a T2000 with   
 Solaris 10u4.
 It never sees over 25% CPU, ever.  Oh yeah, and two 1GB  network  
 links to
 the SAN
 . . .
 My opinion is, if when the array got really loaded up, everything  
 slowed
 down evenly, users wouldn't mind or notice much.  But when every 20  
 or  so
 reads/writes gets delayed my 10s of seconds, the users start to  
 line  up at
 my door.



This is the write throttling problem. I've tested code that changes  
radically the
situation for the better. We just need to go through performance  
validation before putback.

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6429205


-r



 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance with Sun StorageTek 2540

2008-02-15 Thread Roch Bourbonnais

Le 15 févr. 08 à 03:34, Bob Friesenhahn a écrit :

 On Thu, 14 Feb 2008, Tim wrote:

 If you're going for best single file write performance, why are you  
 doing
 mirrors of the LUNs?  Perhaps I'm misunderstanding why you went  
 from one
 giant raid-0 to what is essentially a raid-10.

 That decision was made because I also need data reliability.

 As mentioned before, the write rate peaked at 200MB/second using
 RAID-0 across 12 disks exported as one big LUN.

What was the interlace on the LUN ?


 Other firmware-based
 methods I tried typically offered about 170MB/second.  Even a four
 disk firmware-managed RAID-5 with ZFS on top offered about
 165MB/second.  Given that I would like to achieve 300MB/second, a few
 tens of MB don't make much difference.  It may be that I bought the
 wrong product, but perhaps there is a configuration change which will
 help make up some of the difference without sacrificing data
 reliability.


If this is 165MB application rate consider that ZFS sends that much to  
each side of the mirror.
Your data channel rate was 330MB/sec.


-r


 Bob
 ==
 Bob Friesenhahn
 [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
 GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS write throttling

2008-02-15 Thread Philip Beevers
Hi everyone,

This is my first post to zfs-discuss, so be gentle with me :-)

I've been doing some testing with ZFS - in particular, in checkpointing
the large, proprietary in-memory database which is a key part of the
application I work on. In doing this I've found what seems to be some
fairly unhelpful write throttling behaviour from ZFS.

In summary, the environment is:

* An x4600 with 8 CPUs and 128GBytes of memory
* A 50GByte in-memory database
* A big, fast disk array (a 6140 with a LUN comprised of 4 SATA drives)
* Running Solaris 10 update 4 (problems initially seen on U3 so I got it
patched)

The problems happen when I checkpoint the database, which involves
putting that database on disk as quickly as possible, using the write(2)
system call.

The first time the checkpoint is run, it's quick - about 160MBytes/sec,
even though the disk array is only sustaining 80MBytes/sec. So we're
dirtying stuff in the ARC (and growing the ARC) at a pretty impressive
rate.

After letting the IO subside, running the checkpoint again results in
very different behaviour. It starts running very quickly, again at
160MByte/sec (with the underlying device doing 80MBytes/sec), and after
a while (presumably once the ARC is full) things go badly wrong. In
particular, a write(2) system call hangs for 6-7 minutes, apparently
until all the outstanding IO is done. Any reads from that device also
take a huge amount of time, making the box very unresponsive.

Obviously this isn't good behaviour, but it's particularly unfortunate
given that this checkpoint is stuff that I don't want to retain in any
kind of cache anyway - in fact, preferably I wouldn't pollute the ARC
with it in the first place. But it seems directio(3C) doesn't work with
ZFS (unsurprisingly as I guess this is implemented in segmap), and
madvise(..., MADV_DONTNEED) doesn't drop data from the ARC (again, I
guess, as it's working on segmap/segvn).

Of course, limiting the ARC size to something fairly small makes it
behave much better. But this isn't really the answer.

I also tried using O_DSYNC, which stops the pathological behaviour but
makes things pretty slow - I only get a maximum of about 20MBytes/sec,
which is obviously much less than the hardware can sustain.

It sounds like we could do with different write throttling behaviour to
head this sort of thing off. Of course, the ideal would be to have some
way of telling ZFS not to bother keeping pages in the ARC.

The latter appears to be bug 6429855. But the underlying behaviour
doesn't really seem desirable; are there plans afoot to do any work on
ZFS write throttling to address this kind of thing?


Regards,

-- 

Philip Beevers
Fidessa Infrastructure Development

mailto:[EMAIL PROTECTED]
phone: +44 1483 206571 


This message is intended only for the stated addressee(s) and may be 
confidential.  Access to this email by anyone else is unauthorised. Any 
opinions expressed in this email do not necessarily reflect the opinions of 
Fidessa. Any unauthorised disclosure, use or dissemination, either whole or in 
part is prohibited. If you are not the intended recipient of this message, 
please notify the sender immediately.

Fidessa plc - Registered office:
Dukes Court, Duke Street, Woking, Surrey, GU21 5BH, United Kingdom
Registered in England no. 3781700 VAT registration no. 688 9008 78

Fidessa group plc - Registered Office:
Dukes Court, Duke Street, Woking, Surrey, GU21 5BH, United Kingdom
Registered in England no. 3234176 VAT registration no. 688 9008 78
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS taking up to 80 seconds to flush a single 8KB O_SYNC block.

2008-02-15 Thread Roch Bourbonnais

Le 10 févr. 08 à 12:51, Robert Milkowski a écrit :

 Hello Nathan,

 Thursday, February 7, 2008, 6:54:39 AM, you wrote:

 NK For kicks, I disabled the ZIL: zil_disable/W0t1, and that made  
 not a
 NK pinch of difference. :)

 Have you exported and them imported pool to get zil_disable into
 effect?



I don't think export/import is required.

-r


 -- 
 Best regards,
 Robert Milkowskimailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS write throttling

2008-02-15 Thread Roch Bourbonnais

Le 15 févr. 08 à 11:38, Philip Beevers a écrit :

 Hi everyone,

 This is my first post to zfs-discuss, so be gentle with me :-)

 I've been doing some testing with ZFS - in particular, in  
 checkpointing
 the large, proprietary in-memory database which is a key part of the
 application I work on. In doing this I've found what seems to be some
 fairly unhelpful write throttling behaviour from ZFS.

 In summary, the environment is:

 * An x4600 with 8 CPUs and 128GBytes of memory
 * A 50GByte in-memory database
 * A big, fast disk array (a 6140 with a LUN comprised of 4 SATA  
 drives)
 * Running Solaris 10 update 4 (problems initially seen on U3 so I  
 got it
 patched)

 The problems happen when I checkpoint the database, which involves
 putting that database on disk as quickly as possible, using the  
 write(2)
 system call.

 The first time the checkpoint is run, it's quick - about 160MBytes/ 
 sec,
 even though the disk array is only sustaining 80MBytes/sec. So we're
 dirtying stuff in the ARC (and growing the ARC) at a pretty impressive
 rate.

 After letting the IO subside, running the checkpoint again results in
 very different behaviour. It starts running very quickly, again at
 160MByte/sec (with the underlying device doing 80MBytes/sec), and  
 after
 a while (presumably once the ARC is full) things go badly wrong. In
 particular, a write(2) system call hangs for 6-7 minutes, apparently
 until all the outstanding IO is done. Any reads from that device also
 take a huge amount of time, making the box very unresponsive.

 Obviously this isn't good behaviour, but it's particularly unfortunate
 given that this checkpoint is stuff that I don't want to retain in any
 kind of cache anyway - in fact, preferably I wouldn't pollute the ARC
 with it in the first place. But it seems directio(3C) doesn't work  
 with
 ZFS (unsurprisingly as I guess this is implemented in segmap), and
 madvise(..., MADV_DONTNEED) doesn't drop data from the ARC (again, I
 guess, as it's working on segmap/segvn).

 Of course, limiting the ARC size to something fairly small makes it
 behave much better. But this isn't really the answer.

 I also tried using O_DSYNC, which stops the pathological behaviour but
 makes things pretty slow - I only get a maximum of about 20MBytes/sec,
 which is obviously much less than the hardware can sustain.

 It sounds like we could do with different write throttling behaviour  
 to
 head this sort of thing off. Of course, the ideal would be to have  
 some
 way of telling ZFS not to bother keeping pages in the ARC.

 The latter appears to be bug 6429855. But the underlying behaviour
 doesn't really seem desirable; are there plans afoot to do any work on
 ZFS write throttling to address this kind of thing?


Throttling is being addressed.

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6429205


BTW, the new code will adjust write speed to disk speed very quickly.
You will not see those ultra fast initial checkpoints. Is this a  
concern ?

-r


 Regards,

 -- 

 Philip Beevers
 Fidessa Infrastructure Development

 mailto:[EMAIL PROTECTED]
 phone: +44 1483 206571

 
 This message is intended only for the stated addressee(s) and may be  
 confidential.  Access to this email by anyone else is unauthorised.  
 Any opinions expressed in this email do not necessarily reflect the  
 opinions of Fidessa. Any unauthorised disclosure, use or  
 dissemination, either whole or in part is prohibited. If you are not  
 the intended recipient of this message, please notify the sender  
 immediately.

 Fidessa plc - Registered office:
 Dukes Court, Duke Street, Woking, Surrey, GU21 5BH, United Kingdom
 Registered in England no. 3781700 VAT registration no. 688 9008 78

 Fidessa group plc - Registered Office:
 Dukes Court, Duke Street, Woking, Surrey, GU21 5BH, United Kingdom
 Registered in England no. 3234176 VAT registration no. 688 9008 78
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS write throttling

2008-02-15 Thread Philip Beevers
Hi Roch,

Thanks for the response.

 Throttling is being addressed.
 
   
 http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6429205
 
 
 BTW, the new code will adjust write speed to disk speed very quickly.
 You will not see those ultra fast initial checkpoints. Is 
 this a concern ?

That's good news. No, the loss of initial performance isn't a big
problem - I'd be happy for it to go at spindle speed.

Regards,

-- 

Philip Beevers
Fidessa Infrastructure Development

mailto:[EMAIL PROTECTED]
phone: +44 1483 206571  


This message is intended only for the stated addressee(s) and may be 
confidential.  Access to this email by anyone else is unauthorised. Any 
opinions expressed in this email do not necessarily reflect the opinions of 
Fidessa. Any unauthorised disclosure, use or dissemination, either whole or in 
part is prohibited. If you are not the intended recipient of this message, 
please notify the sender immediately.

Fidessa plc - Registered office:
Dukes Court, Duke Street, Woking, Surrey, GU21 5BH, United Kingdom
Registered in England no. 3781700 VAT registration no. 688 9008 78

Fidessa group plc - Registered Office:
Dukes Court, Duke Street, Woking, Surrey, GU21 5BH, United Kingdom
Registered in England no. 3234176 VAT registration no. 688 9008 78
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Help with the best layout

2008-02-15 Thread Ross
I thought that too, but actually, I'm not sure you can.  You can stripe 
multiple mirror or raid sets with zpool create, but I don't see any 
documentation or examples for mirroring a raid set.

However, in this case even if you could, you might not want to.  Creating a 
stripe that way will restrict the speed of the IDE drives:  They will be 
throttled back to the speed of the USB disk.

What I would do instead is create a mirror of the two IDE drives, then use zfs 
send/receive to send regular backups to the USB drive.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS write throttling

2008-02-15 Thread Tao Chen
On 2/15/08, Roch Bourbonnais [EMAIL PROTECTED] wrote:

  Le 15 févr. 08 à 11:38, Philip Beevers a écrit :

[...]
   Obviously this isn't good behaviour, but it's particularly unfortunate
   given that this checkpoint is stuff that I don't want to retain in any
   kind of cache anyway - in fact, preferably I wouldn't pollute the ARC
   with it in the first place. But it seems directio(3C) doesn't work
   with
   ZFS (unsurprisingly as I guess this is implemented in segmap), and
   madvise(..., MADV_DONTNEED) doesn't drop data from the ARC (again, I
   guess, as it's working on segmap/segvn).
  
   Of course, limiting the ARC size to something fairly small makes it
   behave much better. But this isn't really the answer.
  
   I also tried using O_DSYNC, which stops the pathological behaviour but
   makes things pretty slow - I only get a maximum of about 20MBytes/sec,
   which is obviously much less than the hardware can sustain.
  
   It sounds like we could do with different write throttling behaviour
   to
   head this sort of thing off. Of course, the ideal would be to have
   some
   way of telling ZFS not to bother keeping pages in the ARC.
  
   The latter appears to be bug 6429855. But the underlying behaviour
   doesn't really seem desirable; are there plans afoot to do any work on
   ZFS write throttling to address this kind of thing?
  


 Throttling is being addressed.

 http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6429205


  BTW, the new code will adjust write speed to disk speed very quickly.
  You will not see those ultra fast initial checkpoints. Is this a
  concern ?

I'll wait for more details on how you address this.
Maybe a blog? like this one:
http://blogs.technet.com/markrussinovich/archive/2008/02/04/2826167.aspx

Inside Vista SP1 File Copy Improvements :-

One of the biggest problems with the engine's implementation is
that for copies involving lots of data, the Cache Manager
write-behind thread on the target system often can't keep up with
the rate at which data is written and cached in memory.
That causes the data to fill up memory, possibly forcing other
useful code and data out, and eventually, the target's system's
memory to become a tunnel through which all the copied data
flows at a rate limited by the disk.

Sounds familiar? ;-)

Tao
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [storage-discuss] Preventing zpool imports on boot

2008-02-15 Thread Mike Gerdts
On Thu, Feb 14, 2008 at 11:17 PM, Dave [EMAIL PROTECTED] wrote:
  I don't want Solaris to import any pools at bootup, even when there were
  pools imported at shutdown/at crash time. The process to prevent
  importing pools should be automatic and not require any human
  intervention. I want to *always* import the pools manually.

  Hrm... what if I deleted zpool.cache after importing/exporting any pool?
  Are these the only times zpool.cache is created?

  I wish zpools had a property of 'atboot' or similar, so that you could
  mark a zpool to be imported at boot or not.


Like this?

 temporary

 By default, all pools are persistent and  are  automati-
 cally  opened  when the system is rebooted. Setting this
 boolean property to on causes the pool to  exist  only
 while  the  system is up. If the system is rebooted, the
 pool has to be manually imported  by  using  the  zpool
 import  command.  Setting this property is often useful
 when using pools on removable media, where  the  devices
 may  not  be  present when the system reboots. This pro-
 perty can also be referred to by  its  shortened  column
 name,  temp.


  (I am trying to move this thread over to zfs-discuss, since I originally
  posted to the wrong alias)

storage-discuss trimmed in my reply.

-- 
Mike Gerdts
http://mgerdts.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Help with the best layout

2008-02-15 Thread Richard Elling
Ross wrote:
 I thought that too, but actually, I'm not sure you can.  You can stripe 
 multiple mirror or raid sets with zpool create, but I don't see any 
 documentation or examples for mirroring a raid set.
   

Split the USB disk in half, then mirror each IDE disk to a USB disk half.

 However, in this case even if you could, you might not want to.  Creating a 
 stripe that way will restrict the speed of the IDE drives:  They will be 
 throttled back to the speed of the USB disk.
   

You should be able to get 30 MBytes/s or so to/from a USB disk.
For most general purpose use, this is ok.

 What I would do instead is create a mirror of the two IDE drives, then use 
 zfs send/receive to send regular backups to the USB drive.
   

Yes, this is a safer and longer-term view.  I store my USB drives in
fire safes, one at my house, one someplace else.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] 100% random writes coming out as 50/50 reads/writes

2008-02-15 Thread Neil Perrin


Nathan Kroenert wrote:
 And something I was told only recently - It makes a difference if you 
 created the file *before* you set the recordsize property.
 
 If you created them after, then no worries, but if I understand 
 correctly, if the *file* was created with 128K recordsize, then it'll 
 keep that forever...
 
 Assuming I understand correctly.
 
 Hopefully someone else on the list will be able to confirm.

Yes, that is correct.

Neil.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] How to set ZFS metadata copies=3?

2008-02-15 Thread Vincent Fox
Let's say you are paranoid and have built a pool with 40+ disks in a Thumper.

Is there a way to set metadata copies=3 manually?

After having built RAIDZ2 sets with 7-9 disks and then pooled these together, 
it just seems like a little bit of extra insurance to increase metadata copies. 
 I don't see a need for extra data copies which is currently the only trigger I 
see for that.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS write throttling

2008-02-15 Thread Bob Friesenhahn
On Fri, 15 Feb 2008, Roch Bourbonnais wrote:
 The latter appears to be bug 6429855. But the underlying behaviour
 doesn't really seem desirable; are there plans afoot to do any work on
 ZFS write throttling to address this kind of thing?

 Throttling is being addressed.

   http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6429205

I have observed similar behavior when using 'iozone' on a large file 
to benchmark ZFS on my StorageTek 2540 array.  Fsstat shows gaps of up 
to 30 seconds of no I/O when run on a 10 second update cycle but when 
I go to look at the lights on the array, I see that it is actually 
fully busy.  It seems that the application is stalled during this 
load.  It also seems that simple operations like 'ls' get stalled 
under such heavy load.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance with Sun StorageTek 2540

2008-02-15 Thread Roch Bourbonnais

Le 15 févr. 08 à 18:24, Bob Friesenhahn a écrit :

 On Fri, 15 Feb 2008, Roch Bourbonnais wrote:

 As mentioned before, the write rate peaked at 200MB/second using
 RAID-0 across 12 disks exported as one big LUN.

 What was the interlace on the LUN ?


The question was about LUN  interlace not interface.
128K to 1M works better.

 There are two 4Gbit FC interfaces on an Emulex LPe11002 card which are
 supposedly acting in a load-share configuration.

 If this is 165MB application rate consider that ZFS sends that much  
 to each
 side of the mirror.
 Your data channel rate was 330MB/sec.

 Yes, I am aware of the ZFS RAID write penalty but in fact it has


 only cost 20MB per second vs doing the RAID using controller firmware
 (150MB vs 170MB/second).  This indicates that there is plenty of
 communications bandwidth from the host to the array.  The measured
 read rates are in the 470MB to 510MB/second range.


Any compression ?
Does turn off checksum helps the number (that would point to a CPU  
limited throughput).

-r


 While writing, it is clear that ZFS does not use all of the drives for
 writes at once since the drive LEDs show that some remain
 temporarily idle and ZFS cycles through them.

 I would be very happy to hear from other StorageTek 2540 owners as to
 the write rate they were able to achieve.




 Bob
 ==
 Bob Friesenhahn
 [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
 GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance with Sun StorageTek 2540

2008-02-15 Thread Bob Friesenhahn
On Fri, 15 Feb 2008, Roch Bourbonnais wrote:
 What was the interlace on the LUN ?

 The question was about LUN  interlace not interface.
 128K to 1M works better.

The segment size is set to 128K.  The max the 2540 allows is 512K. 
Unfortunately, the StorageTek 2540 and CAM documentation does not 
really define what segment size means.

 Any compression ?

Compression is disabled.

 Does turn off checksum helps the number (that would point to a CPU limited 
 throughput).

I have not tried that but this system is loafing during the benchmark. 
It has four 3GHz Opteron cores.

Does this output from 'iostat -xnz 20' help to understand issues?

 extended device statistics
 r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
 3.00.7   26.43.5  0.0  0.00.04.2   0   2 c1t1d0
 0.0  154.20.0 19680.3  0.0 20.70.0  134.2   0  59 
c4t600A0B80003A8A0B096147B451BEd0
 0.0  211.50.0 26940.5  1.1 33.95.0  160.5  99 100 
c4t600A0B800039C9B50A9C47B4522Dd0
 0.0  211.50.0 26940.6  1.1 33.95.0  160.4  99 100 
c4t600A0B800039C9B50AA047B4529Bd0
 0.0  154.00.0 19654.7  0.0 20.70.0  134.2   0  59 
c4t600A0B80003A8A0B096647B453CEd0
 0.0  211.30.0 26915.0  1.1 33.95.0  160.5  99 100 
c4t600A0B800039C9B50AA447B4544Fd0
 0.0  152.40.0 19447.0  0.0 20.50.0  134.5   0  59 
c4t600A0B80003A8A0B096A47B4559Ed0
 0.0  213.20.0 27183.8  0.9 34.14.2  159.9  90 100 
c4t600A0B800039C9B50AA847B45605d0
 0.0  152.50.0 19453.4  0.0 20.50.0  134.5   0  59 
c4t600A0B80003A8A0B096E47B456DAd0
 0.0  213.20.0 27177.4  0.9 34.14.2  159.9  90 100 
c4t600A0B800039C9B50AAC47B45739d0
 0.0  213.20.0 27195.3  0.9 34.14.2  159.9  90 100 
c4t600A0B800039C9B50AB047B457ADd0
 0.0  154.40.0 19711.8  0.0 20.70.0  134.0   0  59 
c4t600A0B80003A8A0B097347B457D4d0
 0.0  211.30.0 26958.6  1.1 33.95.0  160.6  99 100 
c4t600A0B800039C9B50AB447B4595Fd0

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance with Sun StorageTek 2540

2008-02-15 Thread Peter Tribble
On Fri, Feb 15, 2008 at 12:30 AM, Bob Friesenhahn
[EMAIL PROTECTED] wrote:
 Under Solaris 10 on a 4 core Sun Ultra 40 with 20GB RAM, I am setting
  up a Sun StorageTek 2540 with 12 300GB 15K RPM SAS drives and
  connected via load-shared 4Gbit FC links.  This week I have tried many
  different configurations, using firmware managed RAID, ZFS managed
  RAID, and with the controller cache enabled or disabled.

  My objective is to obtain the best single-file write performance.
  Unfortunately, I am hitting some sort of write bottleneck and I am not
  sure how to solve it.  I was hoping for a write speed of 300MB/second.
  With ZFS on top of a firmware managed RAID 0 across all 12 drives, I
  hit a peak of 200MB/second.  With each drive exported as a LUN and a
  ZFS pool of 6 pairs, I see a write rate of 154MB/second.  The number
  of drives used has not had much effect on write rate.

May not be relevant, but still worth checking - I have a 2530 (which ought
to be that same only SAS instead of FC), and got fairly poor performance
at first. Things improved significantly when I got the LUNs properly
balanced across the controllers.

-- 
-Peter Tribble
http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS write throttling

2008-02-15 Thread Marion Hakanson
[EMAIL PROTECTED] said:
 I also tried using O_DSYNC, which stops the pathological behaviour but makes
 things pretty slow - I only get a maximum of about 20MBytes/sec, which is
 obviously much less than the hardware can sustain. 

I may misunderstand this situation, but while you're waiting for the new
code from Sun, you might try O_DSYNC and at the same time tell the 6140
to ignore cache-flush requests from the host.  That should get you running
at spindle-speed:

  http://blogs.digitar.com/jjww/?itemid=44

Regards,

Marion


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance with Sun StorageTek 2540

2008-02-15 Thread Bob Friesenhahn
On Fri, 15 Feb 2008, Peter Tribble wrote:

 Each LUN is accessed through only one of the controllers (I presume the
 2540 works the same way as the 2530 and 61X0 arrays). The paths are
 active/passive (if the active fails it will relocate to the other path).
 When I set mine up the first time it allocated all the LUNs to controller B
 and performance was terrible. I then manually transferred half the LUNs
 to controller A and it started to fly.

I assume that you either altered the Access State shown for the LUN 
in the output of 'mpathadm show lu DEVICE' or you noticed and 
observed the pattern:

 Target Port Groups:
 ID:  3
 Explicit Failover:  yes
 Access State:  active
 Target Ports:
 Name:  200400a0b83a8a0c
 Relative ID:  0

 ID:  2
 Explicit Failover:  yes
 Access State:  standby
 Target Ports:
 Name:  200500a0b83a8a0c
 Relative ID:  0

I find this all very interesting and illuminating:

for dev in c4t600A0B80003A8A0B096A47B4559Ed0  \
c4t600A0B80003A8A0B096E47B456DAd0 \
c4t600A0B80003A8A0B096147B451BEd0 \
c4t600A0B80003A8A0B096647B453CEd0 \
c4t600A0B80003A8A0B097347B457D4d0 \
c4t600A0B800039C9B50A9C47B4522Dd0 \
c4t600A0B800039C9B50AA047B4529Bd0 \
c4t600A0B800039C9B50AA447B4544Fd0 \
c4t600A0B800039C9B50AA847B45605d0 \
c4t600A0B800039C9B50AAC47B45739d0 \
c4t600A0B800039C9B50AB047B457ADd0 \
c4t600A0B800039C9B50AB447B4595Fd0 \
do
echo === $dev ===
for mpathadm show lu /dev/rdsk/$dev | grep 'Access State'
for done
=== c4t600A0B80003A8A0B096A47B4559Ed0 ===
 Access State:  active
 Access State:  standby
=== c4t600A0B80003A8A0B096E47B456DAd0 ===
 Access State:  active
 Access State:  standby
=== c4t600A0B80003A8A0B096147B451BEd0 ===
 Access State:  active
 Access State:  standby
=== c4t600A0B80003A8A0B096647B453CEd0 ===
 Access State:  active
 Access State:  standby
=== c4t600A0B80003A8A0B097347B457D4d0 ===
 Access State:  active
 Access State:  standby
=== c4t600A0B800039C9B50A9C47B4522Dd0 ===
 Access State:  active
 Access State:  standby
=== c4t600A0B800039C9B50AA047B4529Bd0 ===
 Access State:  standby
 Access State:  active
=== c4t600A0B800039C9B50AA447B4544Fd0 ===
 Access State:  standby
 Access State:  active
=== c4t600A0B800039C9B50AA847B45605d0 ===
 Access State:  standby
 Access State:  active
=== c4t600A0B800039C9B50AAC47B45739d0 ===
 Access State:  standby
 Access State:  active
=== c4t600A0B800039C9B50AB047B457ADd0 ===
 Access State:  standby
 Access State:  active
=== c4t600A0B800039C9B50AB447B4595Fd0 ===
 Access State:  standby
 Access State:  active

Notice that the first six LUNs are active to one controller while the 
second six LUNs are active to the other controller.  Based on this, I 
should rebuild my pool by splitting my mirrors across this boundary.

I am really happy that ZFS makes such things easy to try out.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance with Sun StorageTek 2540

2008-02-15 Thread Bob Friesenhahn
On Fri, 15 Feb 2008, Peter Tribble wrote:

 May not be relevant, but still worth checking - I have a 2530 (which ought
 to be that same only SAS instead of FC), and got fairly poor performance
 at first. Things improved significantly when I got the LUNs properly
 balanced across the controllers.

What do you mean by properly balanced across the controllers?  Are 
you using the multipath support in Solaris 10 or are you relying on 
ZFS to balance the I/O load?  Do some disks have more affinity for a 
controller than the other?

With the 2540, there is a FC connection to each redundant controller. 
The Solaris 10 multipathing presumably load-shares the I/O to each 
controller.  The controllers then perform some sort of magic to get 
the data to and from the SAS drives.

The controller stats are below.  I notice that it seems that 
controller B has seen a bit more activity than controller A but the 
firmware does not provide a controller uptime value so it is possible 
that one controller was up longer than another:

Performance Statistics - A on Storage System Array-1
Timestamp:  Fri Feb 15 14:37:39 CST 2008
Total IOPS: 1098.83
Average IOPS:   355.83
Read %: 38.28
Write %:61.71
Total Data Transferred: 139284.41 KBps
Read:   53844.26 KBps
Average Read:   17224.04 KBps
Peak Read:  242232.70 KBps
Written:85440.15 KBps
Average Written:26966.58 KBps
Peak Written:   139918.90 KBps
Average Read Size:  639.96 KB
Average Write Size: 629.94 KB
Cache Hit %:85.32

Performance Statistics - B on Storage System Array-1
Timestamp:  Fri Feb 15 14:37:45 CST 2008
Total IOPS: 1526.69
Average IOPS:   497.32
Read %: 34.90
Write %:65.09
Total Data Transferred: 193594.58 KBps
Read:   68200.00 KBps
Average Read:   24052.61 KBps
Peak Read:  339693.55 KBps
Written:125394.58 KBps
Average Written:37768.40 KBps
Peak Written:   183534.66 KBps
Average Read Size:  895.80 KB
Average Write Size: 883.38 KB
Cache Hit %:75.05

If I then go to the performance stats on an individual disk, I see

Performance Statistics - Disk-08 on Storage System Array-1
Timestamp:  Fri Feb 15 14:43:36 CST 2008
Total IOPS: 196.33
Average IOPS:   72.01
Read %: 9.65
Write %:90.34
Total Data Transferred: 25076.91 KBps
Read:   2414.11 KBps
Average Read:   3521.44 KBps
Peak Read:  48422.00 KBps
Written:22662.79 KBps
Average Written:5423.78 KBps
Peak Written:   28036.43 KBps
Average Read Size:  127.29 KB
Average Write Size: 127.77 KB
Cache Hit %:89.30

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance with Sun StorageTek 2540

2008-02-15 Thread Luke Lonergan
Hi Bob,

I¹m assuming you¹re measuring sequential write speed ­ posting the iozone
results would help guide the discussion.

For the configuration you describe, you should definitely be able to sustain
200 MB/s write speed for a single file, single thread due to your use of
4Gbps Fibre Channel interfaces and RAID1.  Someone else brought up that with
host based mirroring over that interface you will be sending the data twice
over the FC-AL link, so since you only have 400 MB/s on the FC-AL interface
(load balancing will only work for two writes), then you have to divide that
by two.

If you do the mirroring on the RAID hardware you¹ll get double that speed on
writing, or 400MB/s and the bottleneck is still the single FC-AL interface.

By comparison, we get 750 MB/s sequential read using six 15K RPM 300GB disks
on an adaptec (Sun OEM) in-host SAS RAID adapter in RAID10 on four streams
and I think I saw 350 MB/s write speed on one stream.  Each disk is capable
of 130 MB/s of read and write speed.

- Luke


On 2/15/08 10:39 AM, Bob Friesenhahn [EMAIL PROTECTED] wrote:

 On Fri, 15 Feb 2008, Roch Bourbonnais wrote:
  What was the interlace on the LUN ?
 
  The question was about LUN  interlace not interface.
  128K to 1M works better.
 
 The segment size is set to 128K.  The max the 2540 allows is 512K.
 Unfortunately, the StorageTek 2540 and CAM documentation does not
 really define what segment size means.
 
  Any compression ?
 
 Compression is disabled.
 
  Does turn off checksum helps the number (that would point to a CPU limited
  throughput).
 
 I have not tried that but this system is loafing during the benchmark.
 It has four 3GHz Opteron cores.
 
 Does this output from 'iostat -xnz 20' help to understand issues?
 
  extended device statistics
  r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
  3.00.7   26.43.5  0.0  0.00.04.2   0   2 c1t1d0
  0.0  154.20.0 19680.3  0.0 20.70.0  134.2   0  59
 c4t600A0B80003A8A0B096147B451BEd0
  0.0  211.50.0 26940.5  1.1 33.95.0  160.5  99 100
 c4t600A0B800039C9B50A9C47B4522Dd0
  0.0  211.50.0 26940.6  1.1 33.95.0  160.4  99 100
 c4t600A0B800039C9B50AA047B4529Bd0
  0.0  154.00.0 19654.7  0.0 20.70.0  134.2   0  59
 c4t600A0B80003A8A0B096647B453CEd0
  0.0  211.30.0 26915.0  1.1 33.95.0  160.5  99 100
 c4t600A0B800039C9B50AA447B4544Fd0
  0.0  152.40.0 19447.0  0.0 20.50.0  134.5   0  59
 c4t600A0B80003A8A0B096A47B4559Ed0
  0.0  213.20.0 27183.8  0.9 34.14.2  159.9  90 100
 c4t600A0B800039C9B50AA847B45605d0
  0.0  152.50.0 19453.4  0.0 20.50.0  134.5   0  59
 c4t600A0B80003A8A0B096E47B456DAd0
  0.0  213.20.0 27177.4  0.9 34.14.2  159.9  90 100
 c4t600A0B800039C9B50AAC47B45739d0
  0.0  213.20.0 27195.3  0.9 34.14.2  159.9  90 100
 c4t600A0B800039C9B50AB047B457ADd0
  0.0  154.40.0 19711.8  0.0 20.70.0  134.0   0  59
 c4t600A0B80003A8A0B097347B457D4d0
  0.0  211.30.0 26958.6  1.1 33.95.0  160.6  99 100
 c4t600A0B800039C9B50AB447B4595Fd0
 
 Bob
 ==
 Bob Friesenhahn
 [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
 GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance with Sun StorageTek 2540

2008-02-15 Thread Bob Friesenhahn
On Fri, 15 Feb 2008, Luke Lonergan wrote:
 I only managed to get 200 MB/s write when I did RAID 0 across all
 drives using the 2540's RAID controller and with ZFS on top.

 Ridiculously bad.

I agree. :-(

 While I agree that data is sent twice (actually up to 8X if striping
 across four mirrors)

 Still only twice the data that would otherwise be sent, in other words: the
 mirroring causes a duplicate set of data to be written.

Right.  But more little bits of data to be sent due to ZFS striping.

 Given that you're not even saturating the FC-AL links, the problem is in the
 hardware RAID.  I suggest disabling read and write caching in the hardware
 RAID.

Hardware RAID is not an issue in this case since each disk is exported 
as a LUN.  Performance with ZFS is not much different than when 
hardware RAID was used.  I previously tried disabling caching in the 
hardware and it did not make a difference in the results.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance with Sun StorageTek 2540

2008-02-15 Thread Albert Chin
On Fri, Feb 15, 2008 at 09:00:05PM +, Peter Tribble wrote:
 On Fri, Feb 15, 2008 at 8:50 PM, Bob Friesenhahn
 [EMAIL PROTECTED] wrote:
  On Fri, 15 Feb 2008, Peter Tribble wrote:
   
May not be relevant, but still worth checking - I have a 2530 (which 
  ought
to be that same only SAS instead of FC), and got fairly poor performance
at first. Things improved significantly when I got the LUNs properly
balanced across the controllers.
 
   What do you mean by properly balanced across the controllers?  Are
   you using the multipath support in Solaris 10 or are you relying on
   ZFS to balance the I/O load?  Do some disks have more affinity for a
   controller than the other?
 
 Each LUN is accessed through only one of the controllers (I presume the
 2540 works the same way as the 2530 and 61X0 arrays). The paths are
 active/passive (if the active fails it will relocate to the other path).
 When I set mine up the first time it allocated all the LUNs to controller B
 and performance was terrible. I then manually transferred half the LUNs
 to controller A and it started to fly.

http://groups.google.com/group/comp.unix.solaris/browse_frm/thread/59b43034602a7b7f/0b500afc4d62d434?lnk=stq=#0b500afc4d62d434

-- 
albert chin ([EMAIL PROTECTED])
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] 100% random writes coming out as 50/50 reads/writes

2008-02-15 Thread Richard Elling
Nathan Kroenert wrote:
 And something I was told only recently - It makes a difference if you 
 created the file *before* you set the recordsize property.

Actually, it has always been true for RAID-0, RAID-5, RAID-6.
If your I/O strides over two sets then you end up doing more I/O,
perhaps twice as much.


 If you created them after, then no worries, but if I understand 
 correctly, if the *file* was created with 128K recordsize, then it'll 
 keep that forever...

Files have nothing to do with it.  The recordsize is a file system
parameter.  It gets a little more complicated because the recordsize
is actually the maximum recordsize, not the minimum.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance with Sun StorageTek 2540

2008-02-15 Thread Bob Friesenhahn
On Fri, 15 Feb 2008, Bob Friesenhahn wrote:

 Notice that the first six LUNs are active to one controller while the
 second six LUNs are active to the other controller.  Based on this, I
 should rebuild my pool by splitting my mirrors across this boundary.

 I am really happy that ZFS makes such things easy to try out.

Now that I have tried this out, I can unhappily say that it made no 
measurable difference to actual performance.  However it seems like a 
better layout anyway.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance with Sun StorageTek 2540

2008-02-15 Thread Peter Tribble
On Fri, Feb 15, 2008 at 8:50 PM, Bob Friesenhahn
[EMAIL PROTECTED] wrote:
 On Fri, 15 Feb 2008, Peter Tribble wrote:
  
   May not be relevant, but still worth checking - I have a 2530 (which ought
   to be that same only SAS instead of FC), and got fairly poor performance
   at first. Things improved significantly when I got the LUNs properly
   balanced across the controllers.

  What do you mean by properly balanced across the controllers?  Are
  you using the multipath support in Solaris 10 or are you relying on
  ZFS to balance the I/O load?  Do some disks have more affinity for a
  controller than the other?

Each LUN is accessed through only one of the controllers (I presume the
2540 works the same way as the 2530 and 61X0 arrays). The paths are
active/passive (if the active fails it will relocate to the other path).
When I set mine up the first time it allocated all the LUNs to controller B
and performance was terrible. I then manually transferred half the LUNs
to controller A and it started to fly.

I'm using SAS multipathing for failover and just get ZFS to dynamically stripe
across the LUNs.

Your figures show asymmetry, but that may just be a reflection of the
setup where you just created a single raid-0 LUN which would only use
one path.

(I don't really understand any of this stuff. Too much fiddling around
for my liking.)

-- 
-Peter Tribble
http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance with Sun StorageTek 2540

2008-02-15 Thread Richard Elling
Bob Friesenhahn wrote:
 On Fri, 15 Feb 2008, Luke Lonergan wrote:
   
 I only managed to get 200 MB/s write when I did RAID 0 across all
 drives using the 2540's RAID controller and with ZFS on top.
   
 Ridiculously bad.
 

 I agree. :-(

   
 While I agree that data is sent twice (actually up to 8X if striping
 across four mirrors)
   
 Still only twice the data that would otherwise be sent, in other words: the
 mirroring causes a duplicate set of data to be written.
 

 Right.  But more little bits of data to be sent due to ZFS striping.
   

These little bits should be 128kBytes by default, which should
be plenty to saturate the paths.  There seems to be something else
going on here...

from the iostat data:

 extended device statistics
 r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
...
 0.0  211.50.0 26940.5  1.1 33.95.0  160.5  99 100 
c4t600A0B800039C9B50A9C47B4522Dd0
 0.0  211.50.0 26940.6  1.1 33.95.0  160.4  99 100 
c4t600A0B800039C9B50AA047B4529Bd0
 0.0  154.00.0 19654.7  0.0 20.70.0  134.2   0  59 
c4t600A0B80003A8A0B096647B453CEd0
...


shows that we have an average of 33.9 iops of 128kBytes each queued
to the storage device at a given time.  There is an iop queued to the
storage device at all times (100% busy).  The 59% busy device
might not always be 59% busy, but it is difficult to see from this output
because you used the z flag.  Looks to me like ZFS is keeping the
queues full, and the device is slow to service them (asvc_t).  This
is surprising, to a degree, because we would expect faster throughput
to a nonvolatile write cache.

It would be interesting to see the response for a stable idle system,
start the workload, see the fast response as we hit the write cache,
followed by the slowdown as we fill the write cache.  This sort of
experiment is usually easy to create.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance with Sun StorageTek 2540

2008-02-15 Thread Bob Friesenhahn
On Fri, 15 Feb 2008, Albert Chin wrote:

 http://groups.google.com/group/comp.unix.solaris/browse_frm/thread/59b43034602a7b7f/0b500afc4d62d434?lnk=stq=#0b500afc4d62d434

This is really discouraging.  Based on these newsgroup postings I am 
thinking that the Sun StorageTek 2540 was not a good investment for 
me, especially given that the $23K for it came right out of my own 
paycheck and it took me 6 months of frustration (first shipment was 
damaged) to receive it.  Regardless, this was the best I was able to 
afford unless I built the drive array myself.

The page at 
http://www.sun.com/storagetek/disk_systems/workgroup/2540/benchmarks.jsp 
claims 546.22 MBPS for the large file processing benchmark.  So I go 
to look at the actual SPC2 full disclosure report and see that for one 
stream, the average data rate is 105MB/second (compared with 
102MB/second with RAID-5), and rises to 284MB/second with 10 streams. 
The product obviously performs much better for reads than it does for 
writes and is better for multi-user performance than single-user.

It seems like I am getting a good bit more performance from my own 
setup than what the official benchmark suggests (they used 72MB 
drives, with 24-drives total) so it seems that everything is working 
fine.

This is a lesson for me, and I have certainly learned a fair amount 
about drive arrays, fiber channel, and ZFS, in the process.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] 100% random writes coming out as 50/50 reads/writes

2008-02-15 Thread Mattias Pantzare
  
   If you created them after, then no worries, but if I understand
   correctly, if the *file* was created with 128K recordsize, then it'll
   keep that forever...


 Files have nothing to do with it.  The recordsize is a file system
  parameter.  It gets a little more complicated because the recordsize
  is actually the maximum recordsize, not the minimum.

Please read the manpage:

 Changing the file system's recordsize only affects files
 created afterward; existing files are unaffected.

Nothing is rewritten in the file system when you change recordsize so
is stays the same for existing files.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance with Sun StorageTek 2540

2008-02-15 Thread Joel Miller
The segment size is amount of contiguous space that each drive contributes to a 
single stripe.

So if you have a 5 drive RAID-5 set @ 128k segment size, a single stripe = 
(5-1)*128k = 512k

BTW, Did you tweak the cache sync handling on the array?

-Joel
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Cannot do simultaneous read/write to ZFS over smb.

2008-02-15 Thread Sam
Me again,
Thanks for all the previous help my 10 disc RAIDz2 is running mostly great.  
Just ran into a problem though:

I have the RAIDz2 partition mounted to OS X via smb and I can upload OR 
download data to it just fine, however if I start an upload then start a 
download the upload fails and stops, iostat reports all write bandwidth - 0 
and read bandwidth goes up:

   capacity operationsbandwidth
pool used  avail   read  write   read  write

pile 304G  4.23T 30200   114K  20.7M
pile 305G  4.23T  0384255  44.1M
pile 305G  4.23T  0343  0  42.7M
pile 305G  4.23T  0 32   1022   949K
pile 305G  4.23T201347  25.0M  40.1M
pile 305G  4.23T271  0  33.6M  0


As you can see, I was writing at 42.7MB/s then bandwidth went to essentially 
nothing, started to try to do 25/40MB/s R/W (which failed) and went over to 
33MB/s read.

Anybody encounter this problem before, I tried to google around and search here 
on sumul. read/write but nothing came up.

Sam
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] SunMC module for ZFS

2008-02-15 Thread Torrey McMahon
Anyone have a pointer to a general ZFS health/monitoring module for 
SunMC? There isn't one baked into SunMC proper which means I get to 
write one myself if someone hasn't already done it.

Thanks.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] 100% random writes coming out as 50/50 reads/writes

2008-02-15 Thread Nathan Kroenert
What about new blocks written to an existing file?

Perhaps we could make that clearer in the manpage too...

hm.


Mattias Pantzare wrote:
  
   If you created them after, then no worries, but if I understand
   correctly, if the *file* was created with 128K recordsize, then it'll
   keep that forever...


 Files have nothing to do with it.  The recordsize is a file system
  parameter.  It gets a little more complicated because the recordsize
  is actually the maximum recordsize, not the minimum.
 
 Please read the manpage:
 
  Changing the file system's recordsize only affects files
  created afterward; existing files are unaffected.
 
 Nothing is rewritten in the file system when you change recordsize so
 is stays the same for existing files.
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] 100% random writes coming out as 50/50 reads/writes

2008-02-15 Thread Nathan Kroenert
Hey, Richard -

I'm confused now.

My understanding was that any files created after the recordsize was set 
would use that as the new maximum recordsize, but files already created 
would continue to use the old recordsize.

Though I'm now a little hazy on what will happen when the new existing 
files are updated as well...

hm.

Cheers!

Nathan.

Richard Elling wrote:
 Nathan Kroenert wrote:
 And something I was told only recently - It makes a difference if you 
 created the file *before* you set the recordsize property.
 
 Actually, it has always been true for RAID-0, RAID-5, RAID-6.
 If your I/O strides over two sets then you end up doing more I/O,
 perhaps twice as much.
 

 If you created them after, then no worries, but if I understand 
 correctly, if the *file* was created with 128K recordsize, then it'll 
 keep that forever...
 
 Files have nothing to do with it.  The recordsize is a file system
 parameter.  It gets a little more complicated because the recordsize
 is actually the maximum recordsize, not the minimum.
 -- richard
 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [storage-discuss] Preventing zpool imports on boot

2008-02-15 Thread George Wilson
Mike Gerdts wrote:
 On Feb 15, 2008 2:31 PM, Dave [EMAIL PROTECTED] wrote:
   
 This is exactly what I want - Thanks!

 This isn't in the man pages for zfs or zpool in b81. Any idea when this
 feature was integrated?
 

 Interesting... it is in b76.  I checked several other releases both
 before and after and they didn't have it either.  Perhaps it is not
 part of the committed interface.  I stumbled upon it because I thought
 that I remembered zpool import -R / poolname having the behavior you
 were looking for.  The rather consistent documentation for zpool
 import -R mentioned the temporary attribute.

   
We actually changed this to make it more robust. Now the property is 
called 'cachefile' and you can set it to 'none' if you want it to behave 
like the older 'temporary' property.

- George
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to set ZFS metadata copies=3?

2008-02-15 Thread George Wilson
Vincent Fox wrote:
 Let's say you are paranoid and have built a pool with 40+ disks in a Thumper.

 Is there a way to set metadata copies=3 manually?

 After having built RAIDZ2 sets with 7-9 disks and then pooled these together, 
 it just seems like a little bit of extra insurance to increase metadata 
 copies.  I don't see a need for extra data copies which is currently the only 
 trigger I see for that.
  
  
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
   
ZFS already does something like this for metadata by setting either 2 or 
3 copies based on the metadata type. Take a look at 
dmu_get_replication_level().

- George
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] 'du' is not accurate on zfs

2008-02-15 Thread Bob Friesenhahn
I have a script which generates a file and then immediately uses 'du 
-h' to obtain its size.  With Solaris 10 I notice that this often 
returns an incorrect value of '0' as if ZFS is lazy about reporting 
actual disk use.  Meanwhile, 'ls -l' does report the correct size.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss