Re: [zfs-discuss] ZFS performance using slices vs. entire disk?
> is zfs any less efficient with just using a portion of a > disk versus the entire disk? As others mentioned, if we're given a whole disk (i.e. no slice is specified) then we can safely enable the write cache. One other effect -- probably not huge -- is that the block placement algorithm is most optimal for an outer-to-inner track diameter ratio of about 2:1, which reflects typical platters. To quote the source: http://cvs.opensolaris.org/source/xref/on/usr/src/uts/common/fs/zfs/metaslab.c#m etaslab_weight /* * Modern disks have uniform bit density and constant angular velocity. * Therefore, the outer recording zones are faster (higher bandwidth) * than the inner zones by the ratio of outer to inner track diameter, * which is typically around 2:1. We account for this by assigning * higher weight to lower metaslabs (multiplier ranging from 2x to 1x). * In effect, this means that we'll select the metaslab with the most * free bandwidth rather than simply the one with the most free space. */ But like I said, the effect isn't huge -- the high-order bit that we have a preference for low LBAs. It's a second-order optimization to bias the allocation based on the maximum free bandwidth, which is currently based on an assumption about physical disk construction. In the future we'll do the smart thing and compute each metaslab's allocation bias based on its actual observed bandwidth. Jeff ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS performance using slices vs. entire disk?
> ZFS will try to enable write cache if whole disks is given. > > Additionally keep in mind that outer region of a disk is much faster. And it's portable. If you use whole disks, you can export the pool from one machine and import it on another. There's no way to export just one slice and leave the others behind... Jeff ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Best Practices for StorEdge 3510 Array and ZFS
On Aug 2, 2006, at 17:03, prasad wrote: Torrey McMahon <[EMAIL PROTECTED]> wrote: Are any other hosts using the array? Do you plan on carving LUNs out of the RAID5 LD and assigning them to other hosts? There are no other hosts using the array. We need all the available space (2.45TB) on just one host. One option was to create 2 LUN's and use raidz. raidz on RAID5 isn't very efficient and you'll want at least 3 lun's to do it .. you're calculating double parity and tying up too much of your drive bandwidth. if you're going to some variation of RAID5 the best throughput you'll see is to *either* pick the HW RAID characteristics *or* ZFS raidz .. but not both .. if you want a *lot* of redundancy you could create a bunch of RAID10 volumes and then do a raidz on the zpool - but you're really going to lose a lot of capacity that way. What you really want to do is make efficient use of the array cache *and* the copy on write zfs "cache" so you're doing mostly memory to memory transfers. so that leaves us with 2 options (each with slight variations) option 1 - raidz: I would use all the disks in the 3510 to make either 4 x 3 disk or 6 x 2 disk R0 volumes and balance them across the controllers (assuming you have 2) .. then create your raidz zpool out of all the disks .. the disadvantage (or advantage depending on how you look at it) here is that you're not using the parity engine in the 3510 and you can't really hot spare from the array.. the advantage though is the software based error correction you'll be able to do. option 2 - RAID5 either use the volume you already have or make 2 R5 volumes if you have 2 controllers to balance the LUNs .. it won't matter if they're the same size or not, and you should only really need 1 global hot spare .. then create a standard zpool with these .. the disadvantage is that you won't get the lovely raidz features .. but the possible advantage is that you've offloaded the parity calculation and workload from the host Keep in mind that zfs was originally designed with JBOD in mind .. there's still ongoing discussions on how hw RAID fits into the picture with the new and lovely sw raidz and whether or not socks will be worn when testing one vs the other .. --- .je ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS & iSCSI: where do do the mirroring/raidz
On Thu, Darren J Moffat wrote: > Spencer Shepler wrote: > >On Wed, Darren J Moffat wrote: > >>I have 12 36G disks (in a single D2 enclosure) connected to a V880 that > >>I want to "share" to a v40z that is on the same gigabit network switch. > >>I've already decided that NFS is not the answer - the performance of ON > >>consolidation builds over NFS just doesn't cut it for me. > > > >? > > > >With a locally attached 3510 array on a 4-way v40z, I have been > >able to do a full nighly build in 1 hour 7 minutes. > >With NFSv3 access, from the same system, to a couple of > >different NFS servers, I have been able to achieve 1 hour 15 minutes > >in one case and 1 hour 22 minutes in the other. > > That would be perfectly acceptable. I note you do say NFSv3 though and > not NFSv4. Is there a reason why you said NFSv3 and not v4 ? I haven't > changed the config on either machine so I'm defaulting to v4. Mainly because that was the data I had at hand. I have been collecting various pieces of data and have yet to pick up the NFSv4 data. There is additional overhead with the NFSv4 client because of the protocol's introduction of OPEN/CLOSE operations. Therefore, for some workloads and hardware platforms, NFSv4 will be slower. Builds are one of those thing that is sensitive to hardware platform at the client. Once I get the data, I will followup. Spencer ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS & iSCSI: where do do the mirroring/raidz
Spencer Shepler wrote: On Wed, Darren J Moffat wrote: I have 12 36G disks (in a single D2 enclosure) connected to a V880 that I want to "share" to a v40z that is on the same gigabit network switch. I've already decided that NFS is not the answer - the performance of ON consolidation builds over NFS just doesn't cut it for me. ? With a locally attached 3510 array on a 4-way v40z, I have been able to do a full nighly build in 1 hour 7 minutes. With NFSv3 access, from the same system, to a couple of different NFS servers, I have been able to achieve 1 hour 15 minutes in one case and 1 hour 22 minutes in the other. That would be perfectly acceptable. I note you do say NFSv3 though and not NFSv4. Is there a reason why you said NFSv3 and not v4 ? I haven't changed the config on either machine so I'm defaulting to v4. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS performance using slices vs. entire disk?
Hello Joseph, Thursday, August 3, 2006, 2:02:28 AM, you wrote: JM> I know this is going to sound a little vague but... JM> A coworker said he read somewhere that ZFS is more efficient if you JM> configure pools from entire disks instead of just slices of disks. I'm JM> curious if there is any merit to this? JM> The use case that we had been discussing was something to the effect of JM> building a 2 disk system, install the OS on slice 0 of disk 0 and make JM> the rest of the disk available for 1/2 of a zfs mirror. Then disk 1 JM> would probably be partitioned the same, but the only thing active would JM> be the other 1/2 of a zfs mirror. JM> Now clearly there is a contention issue between the OS and the data JM> partition, which would be there if SVM mirrors were used instead. But JM> besides this, is zfs any less efficient with just using a portion of a JM> disk versus the entire disk? ZFS will try to enable write cache if whole disks is given. Additionally keep in mind that outer region of a disk is much faster. So if you want to put OS and then designate rest of the disk for application then probably putting ZFS on a slice beginning on cyl 0 is best in most scenarios. -- Best regards, Robertmailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS performance using slices vs. entire disk?
Joseph Mocker wrote: I know this is going to sound a little vague but... A coworker said he read somewhere that ZFS is more efficient if you configure pools from entire disks instead of just slices of disks. I'm curious if there is any merit to this? If the entire disk is used in a zpool then the disk cache can, and in most cases is, enabled. This speeds operations up quite a bit in some scenarios. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: ZFS Web Administration Tool
Thanks Steve. The workaround (rm -f /var/webconsole/tmp/console_*.tmp) and a restart fixed it. I appreciate the quick response. You guys are good! Ron This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS performance using slices vs. entire disk?
On Wed, 2 Aug 2006, Joseph Mocker wrote: > The use case that we had been discussing was something to the effect of > building a 2 disk system, install the OS on slice 0 of disk 0 and make the > rest of the disk available for 1/2 of a zfs mirror. Then disk 1 would probably > be partitioned the same, but the only thing active would be the other 1/2 of a > zfs mirror. Why wouldn't you mirror (using SVM) the OS slice on disk 1 too? Sorry, can't answer the ZFS bit of the question... -- Rich Teer, SCNA, SCSA, OpenSolaris CAB member President, Rite Online Inc. Voice: +1 (250) 979-1638 URL: http://www.rite-group.com/rich ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS performance using slices vs. entire disk?
I know this is going to sound a little vague but... A coworker said he read somewhere that ZFS is more efficient if you configure pools from entire disks instead of just slices of disks. I'm curious if there is any merit to this? The use case that we had been discussing was something to the effect of building a 2 disk system, install the OS on slice 0 of disk 0 and make the rest of the disk available for 1/2 of a zfs mirror. Then disk 1 would probably be partitioned the same, but the only thing active would be the other 1/2 of a zfs mirror. Now clearly there is a contention issue between the OS and the data partition, which would be there if SVM mirrors were used instead. But besides this, is zfs any less efficient with just using a portion of a disk versus the entire disk? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Best Practices for StorEdge 3510 Array and ZFS
Torrey McMahon <[EMAIL PROTECTED]> wrote: > Are any other hosts using the array? Do you plan on carving LUNs out of > the RAID5 LD and assigning them to other hosts? There are no other hosts using the array. We need all the available space (2.45TB) on just one host. One option was to create 2 LUN's and use raidz. -- prasad This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: [Fwd: [zones-discuss] Zone boot problems after installing patches]
Dave, I'm copying the zfs-discuss alias on this as well... It's possible that not all necessary patches have been installed or they maybe hitting CR# 6428258. If you reboot the zone does it continue to end up in maintenance mode? Also do you know if the necessary ZFS/Zones patches have been updated? Take a look at our webpage which includes the patch list required for Solaris 10: http://rpe.sfbay/bin/view/Tech/ZFS Thanks, George Mahesh Siddheshwar wrote: Original Message Subject: [zones-discuss] Zone boot problems after installing patches Date: Wed, 02 Aug 2006 13:47:46 -0400 From: Dave Bevans <[EMAIL PROTECTED]> To: zones-discuss@opensolaris.org, [EMAIL PROTECTED], [EMAIL PROTECTED] Hi, I have a customer with the following problem. He has a V440 running Solaris 10 1/06 with zones. In the case notes he says that he installed a couple Sol 10 patches and now he has problems booting his zones. After doing some checking he found that it appears to be related to a couple of ZFS patches (122650 and 122640). I found a bug (6271309 / lack of zvol breaks all ZFS commands), but not sure if it applies to this situation. Any ideas on this. Here is the customers problem description... Hardware Platform: Sun Fire V440 Component Affected: OS Base OS and Kernel Version: SunOS snb-fton-bck2 5.10 Generic_118833-18 sun4u sparc SUNW,Sun-Fire-V440 Describe the problem: Patch 122650-02 combined with patch 122640-05 seems to have broken no global zones at boot time. I'm just guessing at the exact patches since they were both added recently, and involve the files /usr/sbin/zfs and /lib/svc/method/fs-local which combined, cause the issue. This section of code in /lib/svc/method/fs-local: if [ -x /usr/sbin/zfs ]; then /usr/sbin/zfs mount -a >/dev/msglog 2>&1 rc=$? if [ $rc -ne 0 ]; then msg="WARNING: /usr/sbin/zfs mount -a failed: exit status $rc" echo $msg echo "$SMF_FMRI:" $msg >/dev/msglog result=$SMF_EXIT_ERR_FATAL fi fi causes the local file system service to exit with an error, and stop the boot process. The reason why is that the non global zone does not have access to /dev/zfs so the "/usr/sbin/zfs mount -a" command exits with an error code. This system is SRS Net Connect enabled: No I will be sending an Explorer file: No List steps to reproduce the problem(if applicable): Global zone: bash-3.00# /usr/sbin/zfs mount -a bash-3.00# echo $? 0 CVS Zone: bash-3.00# zlogin cvs [Connected to zone 'cvs' pts/2] Last login: Tue Aug 1 11:51:58 on pts/2 Sun Microsystems Inc. SunOS 5.10 Generic January 2005 # /usr/sbin/zfs mount -a internal error: unable to open ZFS device # echo $? 1 = It Looks like /dev/zfs is not created in the non-global zone, but is required for the startup script change included in patch 122650-02: Global Zone: bash-3.00# truss -fald -t open /usr/sbin/zfs mount -a Base time stamp: 115288.9594 [ Tue Aug 1 11:58:08 ADT 2006 ] 16159/1: 0. execve("/sbin/zfs", 0xFFBFFD8C, 0xFFBFFD9C) argc = 3 16159/1: argv: /usr/sbin/zfs mount -a ... 16159/1: 0.0192 open("/etc/mnttab", O_RDONLY) = 3 16159/1: 0.0203 open("/dev/zfs", O_RDWR)= 4 CVS Zone: # truss -fald -t open /usr/sbin/zfs mount -a Base time stamp: 115344.9469 [ Tue Aug 1 11:59:04 ADT 2006 ] 16198/1: 0. execve("/sbin/zfs", 0xFFBFFECC, 0xFFBFFEDC) argc = 3 16198/1: argv: /usr/sbin/zfs mount -a ... 16198/1: 0.0181 open("/etc/mnttab", O_RDONLY) = 3 16198/1: 0.0191 open("/dev/zfs", O_RDWR) Err#2 ENOENT internal error: unable to open ZFS device # ls -l "/dev/zfs" /dev/zfs: No such file or directory == bash-3.00# zonecfg -z cvs info zonepath: /oracle/zones/cvs autoboot: true pool: inherit-pkg-dir: dir: /lib inherit-pkg-dir: dir: /platform inherit-pkg-dir: dir: /sbin inherit-pkg-dir: dir: /usr fs: dir: /data special: /data raw not specified type: lofs options: [] net: address: 142.139.95.4 physical: ce0 When was the problem first noticed: August 1. The problem is: staying the same Any changes recently?: New Patch Applied What software is having the problem?: bash-3.00# uname -a SunOS snb-fton-bck2 5.10 Generic_118833-18 sun4u sparc SUNW,Sun-Fire-V440 bash-3.00# cat /etc/releaseSolaris 10 1/06 s10s_u1wos_19a SPARCCopyright 2005 Sun Microsystems, Inc. All Rights Reserved. Use is subject to license terms.Assembled 07 December 2005 ___ zones-discuss mailing list zones-discuss@opensolaris.org __
Re: [zfs-discuss] 3510 JBOD ZFS vs 3510 HW RAID
On Wed, 2 Aug 2006, Richard Elling wrote: > From a space perspective, I can put a TByte on my desktop today. Death > of the low-end array is assured by bigger drives. I respectfully disagree. I think there will always be a need for low-end arrays, regardless of the size of the individual disks. I like to keep my OS and data/apps separate (on separate drives preferably)--and I doubt I'm alone. Many of todays smaller servers come with only two disks, which is fine for mirroring root and swap, but the only place to put one's data is on an external array. There are many situations where low-end storage (in terms of numbers of spindles) would be very useful, hence my blog entry a while ago wishing that Sun would produce a 1U, 8-drive SAS array at an affordable price (at least one company has such a product, but I want to buy only Sun HW). -- Rich Teer, SCNA, SCSA, OpenSolaris CAB member President, Rite Online Inc. Voice: +1 (250) 979-1638 URL: http://www.rite-group.com/rich ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Best Practices for StorEdge 3510 Array and ZFS
prasad wrote: I have a StorEdge 3510 FC array which is currently configured in the following way: * logical-drives LDLD-IDSize Assigned Type Disks Spare Failed Status ld0 255ECBD0 2.45TB Primary RAID5 102 0 Good Write-Policy Default StripeSize 128KB What are the best practices of using ZFS on this array so that I can benefit from both ZFS and HW RAID? Are any other hosts using the array? Do you plan on carving LUNs out of the RAID5 LD and assigning them to other hosts? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: ZFS Web Administration Tool
From talking with the web console (Lockhart) folks, this appears to be a manifestation of: 6430996 The SMF services related to smcwebserver goes to maintainance state after node reboot This will be fixed in build 46 of Solaris Nevada. Details, including workaround: > I believe this is Lockhart 3.0.1. You may be hitting a known > problem (I think, based on the SVC log messages). The problem > arises when the system is stopped without explicitly stopping the > Lockhart console thats running. This leaves some bad config files > leftover that prevent the next start from determining if the web > server process gets started. This then results in returning a fatal > error to the SMF restarter; and you are in maintenance mode. > Clearing does not help as long as the bad files are left around. > > Sometimes things are a bit more complicated; we sometimes fail to > stop the server process and it hangs around. This also prevents > restart. The documented workaround for this is: > > 1) Stop the console before rebooting the OS > > 2) If cannot do (2), then after reboot (if things fail) > >svcadm disable system/webconsole:console >ps -ef | grep noaccess | grep server >kill // If ps finds a process >rm -f /var/webconsole/tmp/console_*.tmp >smcwebserver start > >smcwebserver enable // If need to start on reboot Thanks, Steve Ron Halstead wrote: > Why does the Java Web Console service keep going into maintenance mode? This > has happened for the past few builds (current is nv44). It works for a day or > so after a new install then it breaks. Here the the symptoms: > > sol11:$ svcs -x > svc:/system/webconsole:console (java web console) > State: maintenance since Wed Aug 02 08:33:26 2006 > Reason: Start method exited with $SMF_EXIT_ERR_FATAL. >See: http://sun.com/msg/SMF-8000-KS >See: smcwebserver(1M) >See: /var/svc/log/system-webconsole:console.log > Impact: This service is not running. > > sol11:$ tail /var/svc/log/system-webconsole:console.log > (machine has just booted). > [ Aug 2 08:33:06 Executing start method ("/lib/svc/method/svc-webconsole > start") ] > Sun Java(TM) Web Console status can not be determined. > Run "smcwebserver stop" to make sure the server has stopped. > [ Aug 2 08:33:26 Method "start" exited with status 95 ] > > I've run smcwebserver stop and svcadm clear svc:/system/webconsole:console > then svcadm enable webconsole and get the same results. > > The Java Web Console works perfectly on Solaris 10 6/06. > > Ron Halstead > > > This message posted from opensolaris.org > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss pgph9HC6rgoH5.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] 3510 JBOD ZFS vs 3510 HW RAID
Richard Elling wrote: Jonathan Edwards wrote: Now with thumper - you are SPoF'd on the motherboard and operating system - so you're not really getting the availability aspect from dual controllers .. but given the value - you could easily buy 2 and still come out ahead .. you'd have to work out some sort of timely replication of transactions between the 2 units and deal with failure cases with something like a cluster framework. No. Shared data clusters require that both nodes have access to the storage. This is not the case for a thumper, where the disks are not dual-ported and there is no direct access to the disks from an external port. Thumper is not a conventional highly-redundant RAID array. Comparing thumper to a SE3510 on a feature-by-feature basis is truly like comparing apples and oranges. Apples and pomegranates perhaps? You could drop the iSCSI target on it and share the drives ala zvols. The "what is an array, what is a server, what is both" discussion gets interesting based on the qualities of the thing that holds the disks. As far as SPOFs go, all systems which provide a single view of data have at least one SPOF. Claiming a RAID array does not have a SPOF is denying truth. Its the amount of SPOFs and the overall reliability that I think Jonathan was referring too. Of course, we're all systems folks so component failure is always in the back of our mind, right? ;) From a space perspective, I can put a TByte on my desktop today. Death of the low-end array is assured by bigger drives. Its a sliding window. What was midrange ten years ago is low-end or desktop today in the capacity and many cases performance context. Reliability and availability not so much. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] 3510 JBOD ZFS vs 3510 HW RAID
Richard, On 8/2/06 11:37 AM, "Richard Elling" <[EMAIL PROTECTED]> wrote: >> Now with thumper - you are SPoF'd on the motherboard and operating >> system - so you're not really getting the availability aspect from dual >> controllers .. but given the value - you could easily buy 2 and still >> come out ahead .. you'd have to work out some sort of timely replication >> of transactions between the 2 units and deal with failure cases with >> something like a cluster framework. > > No. Shared data clusters require that both nodes have access to the > storage. This is not the case for a thumper, where the disks are not > dual-ported and there is no direct access to the disks from an external > port. Thumper is not a conventional highly-redundant RAID array. > Comparing thumper to a SE3510 on a feature-by-feature basis is truly > like comparing apples and oranges. That's why Thumper DW is a shared nothing fully redundant data warehouse. We replicate the data among systems so that we can lose up to half of the total server count while processing. Basket of Apples >>> one big apple with a worm in it. - Luke ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] 3510 JBOD ZFS vs 3510 HW RAID
Jonathan Edwards wrote: Now with thumper - you are SPoF'd on the motherboard and operating system - so you're not really getting the availability aspect from dual controllers .. but given the value - you could easily buy 2 and still come out ahead .. you'd have to work out some sort of timely replication of transactions between the 2 units and deal with failure cases with something like a cluster framework. No. Shared data clusters require that both nodes have access to the storage. This is not the case for a thumper, where the disks are not dual-ported and there is no direct access to the disks from an external port. Thumper is not a conventional highly-redundant RAID array. Comparing thumper to a SE3510 on a feature-by-feature basis is truly like comparing apples and oranges. As far as SPOFs go, all systems which provide a single view of data have at least one SPOF. Claiming a RAID array does not have a SPOF is denying truth. Then for multi-initiator cross system access - we're back to either some sort of NFS or CIFS layer or we could always explore target mode drivers and virtualization .. so once again - there could be a compelling argument coming in that arena as well. Now, if you already have a big shared FC infrastructure - throwing dense servers in the middle of it all may not make the most sense yet - but on the flip side, we could be seeing a shrinking market for single attach low cost arrays. From a space perspective, I can put a TByte on my desktop today. Death of the low-end array is assured by bigger drives. Lastly (for this discussion anyhow) there's the reliability and quality issues with SATA vs FC drives (bearings, platter materials, tolerances, head skew, etc) .. couple that with the fact that dense systems aren't so great when they fail .. so I guess we're right back to choosing the right systems for the right purposes (ZFS does some great things around failure detection and workaround) .. but i think we've beat that point to death .. Agree, in principle. However, the protocol used to connect to the host is immaterial to the quality of the device. The market segments determine the quality of the device, and the drive vendors find it in their best interest to keep consumer devices inexpensive at all costs, and achieve higher margins on enterprise class devices. What we've done for thumper is to use a top-of-the-line quality SATA drive. AFAIK today, the vendor is Hitachi, though we like to have multiple sources, if they can meet the specifications. Often the vendor and part information is available on the SunSolve Systems Handbook, http://sunsolve.sun.com/handbook_pub/Systems under the Full Components List selection for the specific system. Today, the Sun Fire X4500 is not listed as it has not reached general availability, yet. Look for it soon. So, what is thumper good for? Clearly, it can store a lot of data in a redundant manner (eg. good for retention). GreenPlum, http://www.greenplum.com is building data warehouses with them. Various people are interested in them for streaming media. We don't really know what else it will be used for, there isn't much to compare against in the market. What we do know is that it won't be appropriate for replacing your SE9985 on your ERP system. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] 3510 JBOD ZFS vs 3510 HW RAID
On Aug 1, 2006, at 22:23, Luke Lonergan wrote: Torrey, On 8/1/06 10:30 AM, "Torrey McMahon" <[EMAIL PROTECTED]> wrote: http://www.sun.com/storagetek/disk_systems/workgroup/3510/index.xml Look at the specs page. I did. This is 8 trays, each with 14 disks and two active Fibre channel attachments. That means that 14 disks, each with a platter rate of 80MB/s will be driven over a 400MB/s pair of Fibre Channel connections, a slowdown of almost 3 to 1. This is probably the most expensive, least efficient way to get disk bandwidth available to customers. WRT the discussion about "blow the doors", etc., how about we see some bonnie++ numbers to back it up. actually .. there's SPC-2 vdbench numbers out at: http://www.storageperformance.org/results see the full disclosure report here: http://www.storageperformance.org/results/b5_Sun_SPC2_full- disclosure_r1.pdf of course that's a 36GB 15K FC system with 2 expansion trays, 4HBAs and 3 yrs maintenance in the quote that was spec'd at $72K list (or $56/GB) .. (i'll use list numbers for comparison since they're the easiest ) if you've got a copy of the vdbench tool you might want to try the profiles in the appendix on a thumper - I believe the bonnie/bonnie++ numbers tend to skew more on single threaded low blocksize memory transfer issues. now to bring the thread full circle to the original question of price/ performance and increasing the scope to include the X4500 .. for single attached low cost systems, thumper is *very* compelling particularly when you factor in the density .. for example using list prices from http://store.sun.com/ X4500 (thumper) w/ 48 x 250GB SATA drives = $32995 = $2.68/GB X4500 (thumper) w/ 48 x 500GB SATA drives = $69995 = $2.84/GB SE3511 (dual controller) w/ 12 x 500GB SATA drives = $36995 = $6.17/GB SE3510 (dual controller) w/ 12 x 300GB FC drives = $48995 = $13.61/GB So a 250GB SATA drive configured thumper (server attached with 16GB of cache .. err .. RAM) is 5x less in cost/GB than a 300GB FC drive configured 3510 (dual controllers w/ 2 x 1GB typically mirrored cache) and a 500GB SATA drive configured thumper (server attached) is 2.3x less in cost/GB than a 500GB SATA drive configured 3511 (again dual controllers w/ 2 x 1GB typically mirrored cache) For a single attached system - you're right - 400MB/s is your effective throttle (controller speeds actually) on the 3510 and your realistic throughput on the 3511 is probably going to be less than 1/2 that number if we factor in the back pressure we'll get on the cache against the back loop .. your bonnie ++ block transfer numbers on a 36 drive thumper were showing about 424MB/s on 100% write and about 1435MB/s on 100% read .. it'd be good to see the vdbench numbers as well (but i've have a hard time getting my hands on one since most appear to be out at customer sites) Now with thumper - you are SPoF'd on the motherboard and operating system - so you're not really getting the availability aspect from dual controllers .. but given the value - you could easily buy 2 and still come out ahead .. you'd have to work out some sort of timely replication of transactions between the 2 units and deal with failure cases with something like a cluster framework. Then for multi- initiator cross system access - we're back to either some sort of NFS or CIFS layer or we could always explore target mode drivers and virtualization .. so once again - there could be a compelling argument coming in that arena as well. Now, if you already have a big shared FC infrastructure - throwing dense servers in the middle of it all may not make the most sense yet - but on the flip side, we could be seeing a shrinking market for single attach low cost arrays. Lastly (for this discussion anyhow) there's the reliability and quality issues with SATA vs FC drives (bearings, platter materials, tolerances, head skew, etc) .. couple that with the fact that dense systems aren't so great when they fail .. so I guess we're right back to choosing the right systems for the right purposes (ZFS does some great things around failure detection and workaround) .. but i think we've beat that point to death .. --- .je ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] raidz -> raidz2
Your suspicions are correct, it's not possible to upgrade an existing raidz pool to raidz2. You'll actually have to create the raidz2 pool from scratch. Noel On Aug 2, 2006, at 10:02 AM, Frank Cusack wrote: Will it be possible to update an existing raidz to a raidz2? I wouldn't think so, but maybe I'll be pleasantly surprised. -frank ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] raidz -> raidz2
Will it be possible to update an existing raidz to a raidz2? I wouldn't think so, but maybe I'll be pleasantly surprised. -frank ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Clones and "rm -rf"
Tom Simpson wrote: After I created the filesystem and moved all the data in, I did :- root% chown -R oracle:dba /u05 All that does is change the owner/group of the files/directories. It doesn't change the permissions of the directories and files. What are the permissions of the directories you are trying to delete? Can you gather some truss output from the "rm"? -Mark ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS & iSCSI: where do do the mirroring/raidz
Darren J Moffat wrote: performance, availability, space, retention. OK, something to work with. I would recommend taking advantage of ZFS' dynamic stripe over 2-disk mirrors. This should give good performance, with good data availability. If you monitor the status of the disks regularly, or do not have a 24x7x365 requirement, then you may want the performance of two more disks over the availability and retention gained by spares. In general, the more devices you have, the better performance you can get (iops * N), but also the worse reliability (MTBF / N). High availability is achieved by a combination of reducing risk (diversity), adding redundancy, and decreasing recovery time (spares). High retention is gained by increasing redundancy and decreasing recovery time. [for the archives] If you do not have a large up-front performance or space requirement, then you can take advantage ZFS' dynamic growth. For example, if today you only need 30 GBytes, then you could have a 2-disk mirror with a bunch of spares. Spin down (luxadm stop)[1] the spares or turn off the power to the unused disks (luxadm power_off)[2] to improve their reliability and save power. As your space needs grow, add disks in mirrored pairs. This will optimize your space usage and reliability -> better availability and retention. [1] somebody will probably chime in and say that this isn't supported. It does work well, though. For spun-down disks, Solaris will start them when an I/O operation is issued. [2] may not work for many devices. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] 3510 JBOD ZFS vs 3510 HW RAID
Luke Lonergan wrote: Torrey, On 8/1/06 10:30 AM, "Torrey McMahon" <[EMAIL PROTECTED]> wrote: http://www.sun.com/storagetek/disk_systems/workgroup/3510/index.xml Look at the specs page. I did. This is 8 trays, each with 14 disks and two active Fibre channel attachments. That means that 14 disks, each with a platter rate of 80MB/s will be driven over a 400MB/s pair of Fibre Channel connections, a slowdown of almost 3 to 1. This is probably the most expensive, least efficient way to get disk bandwidth available to customers. Luke - I think you have latched on to a comparison of Thumper to a 3510. With the exception of my note concerning blanket blanket statements and assumptions I've been referring to the original question and subject of comparing the performance of 3510 JBOD to 3510RAID. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Clones and "rm -rf"
I'm not at the machine to check at the moment, but I didn't create the /u05 mountpoint manually. ZFS created it automatically when I did :- % zfs set mountpoint=/u05 zfspool/u05 You would hope that ZFS didn't get the underlying permissions wrong! This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Clones and "rm -rf"
Tom Simpson wrote: Can anyone help? I have a cloned filesystem (/u05) from a snapshot of /u02. The owner/group of the clone is (oracle:dba). If I do oracle% cd /u05/app oracle% rm -rf R2DIR Are you sure you have adequate permissions to descend into and remove the subdirectories? .. All the files in the R2DIR tree are removed, but none of the (sub)directories. If I run the same "rm -rf" as root, the directory tree itself is removed (ie. what I would expect) Any ideas? This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Clones and "rm -rf"
I'm not at the machine to check at the moment, but I didn't create the /u05 mountpoint manually. ZFS created it automatically when I did :- % zfs set mountpoint=/u05 zfspool/u05 You would hope that ZFS didn't get the underlying permissions wrong! This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Clones and "rm -rf"
Just a thought -- unmount /u05 and check what 'ls -l /u05' shows. If the permissions on the directory that you mount onto are wrong (not world-executable, should be 0755), rm -r (and many other commands) will fail in mysterious ways This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS Web Administration Tool
Why does the Java Web Console service keep going into maintenance mode? This has happened for the past few builds (current is nv44). It works for a day or so after a new install then it breaks. Here the the symptoms: sol11:$ svcs -x svc:/system/webconsole:console (java web console) State: maintenance since Wed Aug 02 08:33:26 2006 Reason: Start method exited with $SMF_EXIT_ERR_FATAL. See: http://sun.com/msg/SMF-8000-KS See: smcwebserver(1M) See: /var/svc/log/system-webconsole:console.log Impact: This service is not running. sol11:$ tail /var/svc/log/system-webconsole:console.log(machine has just booted). [ Aug 2 08:33:06 Executing start method ("/lib/svc/method/svc-webconsole start") ] Sun Java(TM) Web Console status can not be determined. Run "smcwebserver stop" to make sure the server has stopped. [ Aug 2 08:33:26 Method "start" exited with status 95 ] I've run smcwebserver stop and svcadm clear svc:/system/webconsole:console then svcadm enable webconsole and get the same results. The Java Web Console works perfectly on Solaris 10 6/06. Ron Halstead This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Clones and "rm -rf"
Actually, just tried this on a non-cloned filesystem with the same results. I can't believe there is a bug with "rm -rf", so is this something to do with ACLs ? Help! Tom This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Best Practices for StorEdge 3510 Array and ZFS
I have a StorEdge 3510 FC array which is currently configured in the following way: * logical-drives LDLD-IDSize Assigned Type Disks Spare Failed Status ld0 255ECBD0 2.45TB Primary RAID5 102 0 Good Write-Policy Default StripeSize 128KB What are the best practices of using ZFS on this array so that I can benefit from both ZFS and HW RAID? Thanks in advance, -- prasad This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Clones and "rm -rf"
Can anyone help? I have a cloned filesystem (/u05) from a snapshot of /u02. The owner/group of the clone is (oracle:dba). If I do oracle% cd /u05/app oracle% rm -rf R2DIR .. All the files in the R2DIR tree are removed, but none of the (sub)directories. If I run the same "rm -rf" as root, the directory tree itself is removed (ie. what I would expect) Any ideas? This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS & iSCSI: where do do the mirroring/raidz
On Wed, Darren J Moffat wrote: > I have 12 36G disks (in a single D2 enclosure) connected to a V880 that > I want to "share" to a v40z that is on the same gigabit network switch. > I've already decided that NFS is not the answer - the performance of ON > consolidation builds over NFS just doesn't cut it for me. ? With a locally attached 3510 array on a 4-way v40z, I have been able to do a full nighly build in 1 hour 7 minutes. With NFSv3 access, from the same system, to a couple of different NFS servers, I have been able to achieve 1 hour 15 minutes in one case and 1 hour 22 minutes in the other. Is that too slow? Spencer ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss