Re: [zfs-discuss] Tuning the ARC towards LRU
Hi, In simple terms, the ARC is divided into a MRU and MFU side. target size (c) = target MRU size (p) + target MFU size (c-p) On Solaris, to get from the MRU to the MFU side, the block must be read at least once in 62.5 milliseconds. For pure read-once workloads, the data won't to the MFU side and the ARC will behave exactly like an (adaptable) MRU cache. Richard, I am looking at the code that moves a buffer from MRU to MFU, and as I read it, if the block is read and the time is greater than 62 milliseconds, it moves from MRU to MFU (lines ~2256 to ~2265 in arc.c). Also, I have a program that reads the same block once every 5 seconds, and on a relatively idle machine, I can find the block in the MFU, not the MRU (using mdb). If the block is read again in less than 62 milliseconds, it stays in the MRU. max ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedup and memory/l2arc requirements
On 03/04/2010 00:57, Richard Elling wrote: This is annoying. By default, zdb is compiled as a 32-bit executable and it can be a hog. Compiling it yourself is too painful for most folks :-( /usr/sbin/zdb is actually a link to /usr/lib/isaexec $ ls -il /usr/sbin/zdb /usr/lib/isaexec 300679 -r-xr-xr-x 92 root bin 8248 Nov 16 10:26 /usr/lib/isaexec* 300679 -r-xr-xr-x 92 root bin 8248 Nov 16 10:26 /usr/sbin/zdb* $ ls -il /usr/sbin/i86/zdb /usr/sbin/amd64/zdb 200932 -r-xr-xr-x 1 root bin 173224 Mar 15 10:20 /usr/sbin/amd64/zdb* 200933 -r-xr-xr-x 1 root bin 159960 Mar 15 10:20 /usr/sbin/i86/zdb* This means both 32 and 64 bit versions are already available and if the kernel is 64 bit then the 64 bit version of zdb will be run if you run /usr/sbin/zdb. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Are there (non-Sun/Oracle) vendors selling OpenSolaris/ZFS based NAS Hardware?
Install nexenta on a dell poweredge ? or one of these http://www.pogolinux.com/products/storage_director FYI; More recent poweredges (R410,R710, possibly blades too, those with integrated Broadcom chips) are not working very well with opensolaris due broadcom network issues, hang-ups packet loss etc. And as opensolaris is not supported OS Dell is not interested to fix these issues. Yours Markus Kovero ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Diagnosing Permanent Errors
On Tue, Apr 6, 2010 at 12:47 AM, Daniel Carosone d...@geek.com.au wrote: On Tue, Apr 06, 2010 at 12:29:35AM -0500, Tim Cook wrote: On Tue, Apr 6, 2010 at 12:24 AM, Daniel Carosone d...@geek.com.au wrote: On Mon, Apr 05, 2010 at 09:35:21PM -0700, Willard Korfhage wrote: By the way, I see that now one of the disks is listed as degraded - too many errors. Is there a good way to identify exactly which of the disks it is? It's hidden in iostat -E, of all places. -- Dan. I think he wants to know how to identify which physical drive maps to the dev ID in solaris. The only way I can think of is to run something like DD against the drive to light up the activity LED. or look at the serial numbers printed in iostat -E -- Dan. And then what? Cross your fingers and hope you pull the right drive on the first go? I don't know of any drives that come from the factory in a hot-swap bay with the serial number printed on the front of the caddy. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Diagnosing Permanent Errors
Yes, I was hoping to find the serial numbers. Unfortunately, it doesn't show any serial numbers for the disk attached to the Areca raid card. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Are there (non-Sun/Oracle) vendors selling OpenSolaris/ZFS based NAS Hardware?
On Tue, Apr 6 at 13:03, Markus Kovero wrote: Install nexenta on a dell poweredge ? or one of these http://www.pogolinux.com/products/storage_director FYI; More recent poweredges (R410,R710, possibly blades too, those with integrated Broadcom chips) are not working very well with opensolaris due broadcom network issues, hang-ups packet loss etc. And as opensolaris is not supported OS Dell is not interested to fix these issues. Our Dell T610 is and has been working just fine for the last year and a half, without a single network problem. Do you know if they're using the same integrated part? --eric -- Eric D. Mudama edmud...@mail.bounceswoosh.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Are there (non-Sun/Oracle) vendors selling OpenSolaris/ZFS based NAS Hardware?
Our Dell T610 is and has been working just fine for the last year and a half, without a single network problem. Do you know if they're using the same integrated part? --eric Hi, as I should have mentioned, integrated nics that cause issues are using Broadcom BCM5709 chipset and these connectivity issues have been quite widespread amongst linux people too, Redhat tries to fix this; http://kbase.redhat.com/faq/docs/DOC-26837 but I believe it's messed up in firmware somehow, as in our tests show 4.6.8-series firmware seems to be more stable. And what comes to workarounds, disabling msi is bad if it creates latency for network/disk controllers and disabling c-states from Nehalem processors is just stupid (having no turbo, power saving etc). Definitely no go for storage imo. Yours Markus Kovero ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Are there (non-Sun/Oracle) vendors selling OpenSolaris/ZFS based NAS Hardware?
Hi, I also ran into the problem of Dell+Broadcom. I fixed it by downgrading the firmware to version 4.xxx instead of running in version 5.xxx . You may try that one as well. Bruno On 6-4-2010 16:54, Eric D. Mudama wrote: On Tue, Apr 6 at 13:03, Markus Kovero wrote: Install nexenta on a dell poweredge ? or one of these http://www.pogolinux.com/products/storage_director FYI; More recent poweredges (R410,R710, possibly blades too, those with integrated Broadcom chips) are not working very well with opensolaris due broadcom network issues, hang-ups packet loss etc. And as opensolaris is not supported OS Dell is not interested to fix these issues. Our Dell T610 is and has been working just fine for the last year and a half, without a single network problem. Do you know if they're using the same integrated part? --eric smime.p7s Description: S/MIME Cryptographic Signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] SSD sale on newegg
Seems a nice sale on Newegg for SSD devices. Talk about choices. What's the latest recommendations for a log device? http://bit.ly/aL1dne -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Are there (non-Sun/Oracle) vendors selling OpenSolaris/ZFS based NAS Hardware?
Hmmm.. Tried to post this before, but it doesn't appear. I'll try again. I've been discussing the concept of a reference design for Opensolaris systems with a few people. This comes very close to a system you can just buy. I spent about six months burning up google and pestering people here about this issue. In the end, I largely copied a system which someone (Constantin Gonzalez) had blogged about here: = http://www.google.com/url?sa=tsource=webct=rescd=1ved=0CAYQFjAAurl=http%3A%2F%2Fblogs.sun.com%2Fconstantin%2Fentry%2Fa_small_and_energy_efficientei=bU-7S97KKY2gnQf25bytCAusg=AFQjCNFhP99ZqaNZrhCOFgsLXHcumcVDOw = It was inexpensive for what I got, and worked largely the first time I connected it up. It would make a good reference design, excepting only that in the several weeks since I made it, the motherboard has been discontinued by ASUS, although it's still available in many places. A reference design is a setup that some knowing person or group has put together and verified to work. It is later replicable by people of lesser skills with little or no exposure to malfunction or long debugging. Here's the system I did: ASUS M3A78-CM (about $60 when I got mine) AMD Athlon II 240e ($70, the 240 is cheaper, but a few more watts) Kingston 800MHz DDR2 unbuffered ECC ram, 2x 2GB ($80) Syba PCIe x1 dual port SATA card ($26) 2x 40GB 2.5 SATA drives for mirrored boot pool ($52) 6x 750GB SATA drives for raidz2 storage pool, giving 4TB usable and 2-disk failure immunity. Case, power supply, cables, etc. to taste. I bought new, because I was looking for a long-term reliable backup server, but used would work as well for lower cost. In spite of reported issues with the ethernet chipset on the mobo, it just worked on my network, as installed. In fact, all of it just worked on install. The driver test utility reported zero issues. USB worked. Keyboard, mouse, and integrated video worked. So did the Syba card. No driver finagling. Bring up time was only extended by my not knowing which commands to type. That includes making the remote console, remote desktop, and storage array available through the network on my Windows XP email machine. Now that I know what commands to type, it would take me less than an hour to set another one up from unpacking the shipping boxes. The knowing what commands to type took me a bit, but it's not terribly taxing. Most of it was finding the help sections on the web and in the Open Solaris Bible and typing what I was told. This would be a great candidate for a reference design except for Asus discontinuing it. That will be the bane of reference designs like this. It pretty much requires an ongoing effort of people assembling and documenting their work as new motherboards flow through the system. This is kind of what the HCL was probably intended to be, but does not measure up to for a neophytes. The HCL for Solaris proper is much more usable in that it seems to have a database back end and lets you select things, bringing up trees of choices. Ah, well. I think a local custom computer shop could replicate my server very quickly indeed. It's not a just buy and unwrap but it's remarkably close. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS on-disk DDT block arrangement
I was wondering if someone could explain why the DDT is seemingly (from empirical observation) kept in a huge number of individual blocks, randomly written across the pool, rather than just a large binary chunk somewhere. Having been victim of the relly long times it takes to destroy a dataset that has dedup=on, I was wondering why that was. From memory, when the destroy process was running, something like iopattern -r showed constant 99% random reads. This seems like a very wasteful approach to allocating blocks for the DDT. Having deleted the 900GB dataset, finally, I now only have around 152GB (allocated PSIZE) left deduped on that pool. # zdb -DD tank DDT-sha256-zap-duplicate: 310684 entries, size 578 on disk, 380 in core DDT-sha256-zap-unique: 1155817 entries, size 2438 on disk, 1783 in core So 1466501 DDT blocks. For 152GB of data, that's around 108KB/block on average, which seems sane. To destroy the dataset holding the files which reference the DDT, I'm looking at 1.46 million random reads to complete the operation (less those elements in ARC or L2ARC). That's a lot of read operations for my poor spindles. I've seen some people saying that the DDT blocks are around 270 bytes each, but does it really matter, if the smallest block that zfs can read/write (for obvious reasons) is 512 bytes? Clearly 2x 270B 512B, but couldn't there be some way of grouping DDT elements together (in say, 1MB blocks)? Thoughts? (side note: can someone explain the size xxx on disk, xxx in core statements in that zdb output for me? The numbers never seem related to the number of entries or anything.) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS Rollback From ZFS Send
Can I rollback a snapshot that I did a zfs send on? ie: zfs send testpool/w...@april6 /backups/w...@april6_2010 Thanks ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Rollback From ZFS Send
On Tue, Apr 06, 2010 at 11:53:23AM -0400, Tony MacDoodle wrote: Can I rollback a snapshot that I did a zfs send on? ie: zfs send testpool/w...@april6 /backups/w...@april6_2010 That you did a zfs send does not prevent you from rolling back to a previous snapshot. Similarly for zfs recv -- that you went from one snapshot to another by zfs receiving a send does not stop you from rolling back to an earlier snapshot. You do need to have an earlier snapshot to rollback to, if you want to rollback. Also, if you are using zfs send for backups, or for replication, and you rollback the primary dataset, then you'll need to update your backups/ replicas accordingly. Nico -- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS: Raid and dedup
Hi Folks: I'm wondering what is the correct flow when both raid5 and de-dup are enabled on a volume I think we should do de-dup first and then raid5 ... is that understanding correct? Thanks! ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: Raid and dedup
Correct. Jeff Sent from my iPhone On Apr 5, 2010, at 6:32 PM, Learner Study learner.st...@gmail.com wrote: Hi Folks: I'm wondering what is the correct flow when both raid5 and de-dup are enabled on a storage volume I think we should do de-dup first and then raid5 ... is that understanding correct? Thanks! ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Diagnosing Permanent Errors
On 6/04/10 11:47 PM, Willard Korfhage wrote: Yes, I was hoping to find the serial numbers. Unfortunately, it doesn't show any serial numbers for the disk attached to the Areca raid card. You'll need to reboot and go into the card bios to get that information. James C. McPherson -- Senior Software Engineer, Solaris Oracle http://www.jmcp.homeunix.com/blog ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] refreservation and ZFS Volume
I am trying to understand how refreservation works with snapshots. If I have a 100G zfs pool I have 4 20G volume groups in that pool. refreservation = 20G on all volume groups. Now when I want to do a snapshot will this snapshot need 20G + the amount changed (REFER)? If not I get a out of space. How does refreservation relate to snapshots? Thanks ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Diagnosing Permanent Errors
Willard Korfhage wrote: Yes, I was hoping to find the serial numbers. Unfortunately, it doesn't show any serial numbers for the disk attached to the Areca raid card. Does Areca provide any Solaris tools that will show you the drive info? If you are using the Areca in JBOD mode, smartctl will frequently show serial numbers that iostat -E will not (iostat appears to be really stupid about getting serial numbers compared to just about any other tool out there). -- Carson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SSD sale on newegg
On Tue, 2010-04-06 at 08:26 -0700, Anil wrote: Seems a nice sale on Newegg for SSD devices. Talk about choices. What's the latest recommendations for a log device? http://bit.ly/aL1dne The Vertex LE models should do well as ZIL (though not as well as an X25-E or a Zeus) for all non-enterprise users. The X25-M is still the best choice for a L2ARC device, but the Vertex Turbo or Cosair Nova are good if you're on a budget. If you really want an SSD a boot drive, or just need something for L2ARC, the various Intel X25-V models are cheap, if not a really great performers. I'd recommend one of these if you want an SSD for rpool, or if you need a large L2ARC for dedup (or similar) and can't afford anything in the X25-M price range. You should also be OK with a Corsair Reactor in this performance category. -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA Timezone: US/Pacific (GMT-0800) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] refreservation and ZFS Volume
On Tue, Apr 06, 2010 at 01:44:20PM -0400, Tony MacDoodle wrote: I am trying to understand how refreservation works with snapshots. If I have a 100G zfs pool I have 4 20G volume groups in that pool. refreservation = 20G on all volume groups. Now when I want to do a snapshot will this snapshot need 20G + the amount changed (REFER)? If not I get a out of space. How does refreservation relate to snapshots? The refreservation is a committment that X amount of space can be written to. When space currently in the volume (usedbydataset) is shared with snapshots, new writes to those blocks will need to allocate new space, and the original copy remains in the snapshot. Therefore, as a snapshot is taken, the usedbyrefreservation figure is increased from whatever its current value, back up to the size of the refreservation. This represents the committment in advance of space from the pool to hold the potential overwrite of the dataset, as well as the new snapshot. If there's not enough pool space for this increase, the snapshot is denied. You have reminded me.. I wrote some patches to the zfs manpage to help clarify this issue, while travelling, and never got around to posting them when I got back. I'll dig them up off my netbook later today. -- Dan. pgpivPxirJvu8.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS on-disk DDT block arrangement
On Wed, Apr 07, 2010 at 01:52:23AM +1000, taemun wrote: I was wondering if someone could explain why the DDT is seemingly (from empirical observation) kept in a huge number of individual blocks, randomly written across the pool, rather than just a large binary chunk somewhere. It's not really a question of physical allocation contiguity, or pre-allocating in larger chunks. Remember that this would not be maintained after updates in a CoW system anyway. It's a question of access pattern. The DDT is indexed by block hash. Hashes are effectively random (for the purposes of this discussion), and so updates to the DDT for blocks in any other order than block hash, is effectively random-order. There's not really an effective way to (say) remove blocks in block-hash order. There might be room for some optimisations here and there (maybe freeing the blocks of each object in hash-order) but the overall access pattern is still going to be heavily random-order. (side note: can someone explain the size xxx on disk, xxx in core statements in that zdb output for me? The numbers never seem related to the number of entries or anything.) I've not yet seen a good one, though there has been some speculation, from me included. -- Dan. pgpvWo2L8ds0e.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Sun Flash Accelerator F20 numbers
Hi Roch, Can you try 4 concurrent tar to four different ZFS filesystems (same pool). Hmmm, you're on to something here: http://www.science.uva.nl/~jeroen/zil_compared_e1000_iostat_iops_svc_t_10sec_interval.pdf In short: when using two exported file systems total time goes down to around 4mins (IOPS maxes out at around 5500 when adding all four vmods together). When using four file systems total time goes down to around 3min30s (IOPS maxing out at about 9500). I figured it is either NFS or a per file system data structure in the ZFS/ZIL interface. To rule out NFS I tried exporting two directories using default NFS shares (via /etc/dfs/dfstab entries). To my surprise this seems to bypass the ZIL all together (dropping to 100 IOPS, which results from our RAIDZ2 configuration). So clearly ZFS sharenfs is more than a nice front end for NFS configuration :). But back to your suggestion: You clearly had a hypothesis behind your question. Care to elaborate? With kind regards, Jeroen -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SSD sale on newegg
On 4/6/2010 3:41 PM, Erik Trimble wrote: On Tue, 2010-04-06 at 08:26 -0700, Anil wrote: Seems a nice sale on Newegg for SSD devices. Talk about choices. What's the latest recommendations for a log device? http://bit.ly/aL1dne The Vertex LE models should do well as ZIL (though not as well as an X25-E or a Zeus) for all non-enterprise users. The X25-M is still the best choice for a L2ARC device, but the Vertex Turbo or Cosair Nova are good if you're on a budget. If you really want an SSD a boot drive, or just need something for L2ARC, the various Intel X25-V models are cheap, if not a really great performers. I'd recommend one of these if you want an SSD for rpool, or if you need a large L2ARC for dedup (or similar) and can't afford anything in the X25-M price range. You should also be OK with a Corsair Reactor in this performance category. What about if you want to get one that you can use for both the rpool, and ZIL (for another data pool?) What if you want one for all 3 (rpool, ZIL, L2ARC)?? -Kyle ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] refreservation and ZFS Volume
On Wed, Apr 07, 2010 at 06:27:09AM +1000, Daniel Carosone wrote: You have reminded me.. I wrote some patches to the zfs manpage to help clarify this issue, while travelling, and never got around to posting them when I got back. I'll dig them up off my netbook later today. http://defect.opensolaris.org/bz/show_bug.cgi?id=15514 -- Dan. pgpcc4C0sSWdH.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SSD sale on newegg
On Tue, 2010-04-06 at 19:43 -0400, Kyle McDonald wrote: On 4/6/2010 3:41 PM, Erik Trimble wrote: On Tue, 2010-04-06 at 08:26 -0700, Anil wrote: Seems a nice sale on Newegg for SSD devices. Talk about choices. What's the latest recommendations for a log device? http://bit.ly/aL1dne The Vertex LE models should do well as ZIL (though not as well as an X25-E or a Zeus) for all non-enterprise users. The X25-M is still the best choice for a L2ARC device, but the Vertex Turbo or Cosair Nova are good if you're on a budget. If you really want an SSD a boot drive, or just need something for L2ARC, the various Intel X25-V models are cheap, if not a really great performers. I'd recommend one of these if you want an SSD for rpool, or if you need a large L2ARC for dedup (or similar) and can't afford anything in the X25-M price range. You should also be OK with a Corsair Reactor in this performance category. What about if you want to get one that you can use for both the rpool, and ZIL (for another data pool?) What if you want one for all 3 (rpool, ZIL, L2ARC)?? -Kyle It all boils down to performance and the tradeoffs you are willing to make. For good ZIL, you want something that has a very high IOPS rating (50,000+ if possible, 10,000+ minimum, particularly when writing small chunks). For L2ARC, you are more concerned with total size/capacity, and modest IOPS (3000-1 IOPS, or the ability to write at least 100Mb/s at 4-8k write sizes, plus as high as possible read I/O). For rpool use, you don't really care about performance so much, as it's almost exclusively read-only (one should generally not configure a swap device on an SSD-based rpool). You could probably live with an X25-M as something to use for all three, but of course you're making tradeoffs all over the place. -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA Timezone: US/Pacific (GMT-0800) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SSD sale on newegg
On Apr 6, 2010, at 5:00 PM, Erik Trimble wrote: On Tue, 2010-04-06 at 19:43 -0400, Kyle McDonald wrote: On 4/6/2010 3:41 PM, Erik Trimble wrote: On Tue, 2010-04-06 at 08:26 -0700, Anil wrote: Seems a nice sale on Newegg for SSD devices. Talk about choices. What's the latest recommendations for a log device? http://bit.ly/aL1dne The Vertex LE models should do well as ZIL (though not as well as an X25-E or a Zeus) for all non-enterprise users. The X25-M is still the best choice for a L2ARC device, but the Vertex Turbo or Cosair Nova are good if you're on a budget. If you really want an SSD a boot drive, or just need something for L2ARC, the various Intel X25-V models are cheap, if not a really great performers. I'd recommend one of these if you want an SSD for rpool, or if you need a large L2ARC for dedup (or similar) and can't afford anything in the X25-M price range. You should also be OK with a Corsair Reactor in this performance category. What about if you want to get one that you can use for both the rpool, and ZIL (for another data pool?) What if you want one for all 3 (rpool, ZIL, L2ARC)?? -Kyle It all boils down to performance and the tradeoffs you are willing to make. For good ZIL, you want something that has a very high IOPS rating (50,000+ if possible, 10,000+ minimum, particularly when writing small chunks). High write IOPS :-) For L2ARC, you are more concerned with total size/capacity, and modest IOPS (3000-1 IOPS, or the ability to write at least 100Mb/s at 4-8k write sizes, plus as high as possible read I/O). The L2ARC fill rate is throttled to 16 MB/sec at boot and 8 MB/sec later. Many SSDs work well as L2ARC cache devices. For rpool use, you don't really care about performance so much, as it's almost exclusively read-only Yep (one should generally not configure a swap device on an SSD-based rpool). Disagree. Swap is a perfectly fine workload for SSDs. Under ZFS, even more so. I'd really like to squash this rumour and thought we were making progress on that front :-( Today, there are millions or thousands of systems with deployed SSDs as boot and swap on a wide variety of OSes. Go for it. You could probably live with an X25-M as something to use for all three, but of course you're making tradeoffs all over the place. That would be better than almost any HDD on the planet because the HDD tradeoffs result in much worse performance. -- richard ZFS storage and performance consulting at http://www.RichardElling.com ZFS training on deduplication, NexentaStor, and NAS performance Las Vegas, April 29-30, 2010 http://nexenta-vegas.eventbrite.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SSD sale on newegg
Erik Trimble wrote: On Tue, 2010-04-06 at 08:26 -0700, Anil wrote: Seems a nice sale on Newegg for SSD devices. Talk about choices. What's the latest recommendations for a log device? http://bit.ly/aL1dne The Vertex LE models should do well as ZIL (though not as well as an X25-E or a Zeus) for all non-enterprise users. I just found an 8 GB SATA Zeus (Z4S28I) for £83.35 (~US$127) shipped to California. That should be more than large enough for my ZIL @home, based on zilstat. The web site says EOL, limited to current stock. http://www.dpieshop.com/stec-zeus-z4s28i-8gb-25-sata-ssd-solid-state-drive-industrial-temp-p-410.html Of course this seems _way_ too good to be true, but I decided to take the risk. -- Carson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SSD sale on newegg
On Tue, 2010-04-06 at 17:17 -0700, Richard Elling wrote: On Apr 6, 2010, at 5:00 PM, Erik Trimble wrote: [snip] For L2ARC, you are more concerned with total size/capacity, and modest IOPS (3000-1 IOPS, or the ability to write at least 100Mb/s at 4-8k write sizes, plus as high as possible read I/O). The L2ARC fill rate is throttled to 16 MB/sec at boot and 8 MB/sec later. Many SSDs work well as L2ARC cache devices. Where is that limit set? That's completely new to me. :-( In any case, L2ARC devices should probably have at least reasonable write performance for small sizes, given the propensity to put things like the DDT and other table structures/metadata into it, all of which is small write chunks. I tried one of the old JMicron-based 1st-gen SSDs as an L2ARC, and it wasn't much of a success. Fast read speed is good for an L2ARC, but that's not generally a problem with even the cheap SSDs these days. (one should generally not configure a swap device on an SSD-based rpool). Disagree. Swap is a perfectly fine workload for SSDs. Under ZFS, even more so. I'd really like to squash this rumour and thought we were making progress on that front :-( Today, there are millions or thousands of systems with deployed SSDs as boot and swap on a wide variety of OSes. Go for it. Really? I'm generally not good for running swap on lower-performing SSDs over here in Java-land, but that may have to do with my specific workload. I'll take your word for it (of course, I'm voting for swap not being necessary on many machines these days). You could probably live with an X25-M as something to use for all three, but of course you're making tradeoffs all over the place. That would be better than almost any HDD on the planet because the HDD tradeoffs result in much worse performance. -- richard True. Viva la SSD! -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA Timezone: US/Pacific (GMT-0800) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SSD sale on newegg
On 04/06/10 17:17, Richard Elling wrote: You could probably live with an X25-M as something to use for all three, but of course you're making tradeoffs all over the place. That would be better than almost any HDD on the planet because the HDD tradeoffs result in much worse performance. Indeed. I've set up a couple small systems (one a desktop workstation, and the other a home fileserver) with root pool plus the l2arc and slog for a data pool on an 80G X25-M and have been very happy with the result. The recipe I'm using is to slice the ssd, with the rpool in s0 with roughly half the space, 1GB in s3 for slog, and the rest of the space as L2ARC in s4. That may actually be overly generous for the root pool, but I run with copies=2 on rpool/ROOT and I tend to keep a bunch of BE's around. - Bill ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] To slice, or not to slice
I have reason to believe that both the drive, and the OS are correct. I have suspicion that the HBA simply handled the creation of this volume somehow differently than how it handled the original. Don't know the answer for sure yet. Ok, that's confirmed now. Apparently when the drives ship from the factory, they're pre-initialized for the HBA, so the HBA happily imports them and creates simple volume (aka jbod) using the factory initialization. Unfortunately, the factory init includes HBA metadata at both the start and end of the drive ... so I lose 1MB. The fix to the problem is to initialize the disk again with the HBA, and then create a new simple volume. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Sun Flash Accelerator F20 numbers
We ran into something similar with these drives in an X4170 that turned out to be an issue of the preconfigured logical volumes on the drives. Once we made sure all of our Sun PCI HBAs where running the exact same version of firmware and recreated the volumes on new drives arriving from Sun we got back into sync on the X25-E devices sizes. Can you elaborate? Just today, we got the replacement drive that has precisely the right version of firmware and everything. Still, when we plugged in that drive, and create simple volume in the storagetek raid utility, the new drive is 0.001 Gb smaller than the old drive. I'm still hosed. Are you saying I might benefit by sticking the SSD into some laptop, and zero'ing the disk? And then attach to the sun server? Are you saying I might benefit by finding some other way to make the drive available, instead of using the storagetek raid utility? Thanks for the suggestions... Sorry for the double post. Since the wrong-sized drive was discussed in two separate threads, I want to stick a link here to the other one, where the question was answered. Just incase anyone comes across this discussion by search or whatever... http://mail.opensolaris.org/pipermail/zfs-discuss/2010-April/039669.html ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SSD sale on newegg
On Apr 6, 2010, at 5:38 PM, Erik Trimble wrote: On Tue, 2010-04-06 at 17:17 -0700, Richard Elling wrote: On Apr 6, 2010, at 5:00 PM, Erik Trimble wrote: [snip] For L2ARC, you are more concerned with total size/capacity, and modest IOPS (3000-1 IOPS, or the ability to write at least 100Mb/s at 4-8k write sizes, plus as high as possible read I/O). The L2ARC fill rate is throttled to 16 MB/sec at boot and 8 MB/sec later. Many SSDs work well as L2ARC cache devices. Where is that limit set? That's completely new to me. :-( L2ARC_WRITE_SIZE (8MB) is the default size of data to be written and L2ARC_FEED_SECS (1) is the interval. When arc_warm is FALSE, the L2ARC_WRITE_SIZE is doubled (16MB). Look somewhere around http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/arc.c#553 This change was made per CR 6709301, An empty L2ARC cache device is slow to warm up http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6709301 I'll agree the feed rate is somewhat arbitrary, but probably suits many use cases. In any case, L2ARC devices should probably have at least reasonable write performance for small sizes, given the propensity to put things like the DDT and other table structures/metadata into it, all of which is small write chunks. I tried one of the old JMicron-based 1st-gen SSDs as an L2ARC, and it wasn't much of a success. I haven't done many L2ARC measurements, but I suspect the writes are large. Fast read speed is good for an L2ARC, but that's not generally a problem with even the cheap SSDs these days. yep. (one should generally not configure a swap device on an SSD-based rpool). Disagree. Swap is a perfectly fine workload for SSDs. Under ZFS, even more so. I'd really like to squash this rumour and thought we were making progress on that front :-( Today, there are millions or thousands of systems with deployed SSDs as boot and swap on a wide variety of OSes. Go for it. Really? I'm generally not good for running swap on lower-performing SSDs over here in Java-land, but that may have to do with my specific workload. I'll take your word for it (of course, I'm voting for swap not being necessary on many machines these days). If you have to swap, you have no performance. But people with SSDs (eg MacBook Air) seem happy to see fewer spinning beach balls :-) -- richard ZFS storage and performance consulting at http://www.RichardElling.com ZFS training on deduplication, NexentaStor, and NAS performance Las Vegas, April 29-30, 2010 http://nexenta-vegas.eventbrite.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SSD sale on newegg
On Tue, Apr 06, 2010 at 06:53:04PM -0700, Richard Elling wrote: Disagree. Swap is a perfectly fine workload for SSDs. Under ZFS, even more so. I'd really like to squash this rumour and thought we were making progress on that front :-( Today, there are millions or thousands of systems with deployed SSDs as boot and swap on a wide variety of OSes. Go for it. +1 Really? I'm generally not good for running swap on lower-performing SSDs over here in Java-land, but that may have to do with my specific workload. I'll take your word for it (of course, I'm voting for swap not being necessary on many machines these days). If you have to swap, you have no performance. Disagree. If you're thrashing heavily, yes. An SSD will make a difference in swap latency up until that point, but that won't help much when everything's stuck short for memory. However, a lot can happen before that point. Swapping out unused stuff (including idle services/processes and old tmpfs pages) can be very useful for performance, making room for the performance-sensitive working set. Some of your lower-priority processes can page in and out faster with an ssd, and smoothe the curve from memory pressure to total gridlock. Finally, this middle ground is where ssd root also helps, because executable text is paged from there. -- Dan. pgpNR9csTjG5h.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Are there (non-Sun/Oracle) vendors selling OpenSolaris/ZFS based NAS Hardware?
On Tue, Apr 6 at 17:56, Markus Kovero wrote: Our Dell T610 is and has been working just fine for the last year and a half, without a single network problem. Do you know if they're using the same integrated part? --eric Hi, as I should have mentioned, integrated nics that cause issues are using Broadcom BCM5709 chipset and these connectivity issues have been quite widespread amongst linux people too, Redhat tries to fix this; http://kbase.redhat.com/faq/docs/DOC-26837 but I believe it's messed up in firmware somehow, as in our tests show 4.6.8-series firmware seems to be more stable. And what comes to workarounds, disabling msi is bad if it creates latency for network/disk controllers and disabling c-states from Nehalem processors is just stupid (having no turbo, power saving etc). Definitely no go for storage imo. Seems like this issue only occurs when MSI-X interrupts are enabled for the BCM5709 chips, or am I reading it wrong? If I type 'echo ::interrupts | mdb -k', and isolate for network-related bits, I get the following output: IRQ Vect IPL Bus Trg Type CPU Share APIC/INT# ISR(s) 36 0x60 6 PCI Lvl Fixed 3 1 0x1/0x4 bnx_intr_1lvl 48 0x61 6 PCI Lvl Fixed 2 1 0x1/0x10 bnx_intr_1lvl Does this imply that my system is not in a vulnerable configuration? Supposedly i'm losing some performance without MSI-X, but I'm not sure in which environments or workloads we would notice since the load on this server is relatively low, and the L2ARC serves data at greater than 100MB/s (wire speed) without stressing much of anything. The BIOS settings in our T610 are exactly as they arrived from Dell when we bought it over a year ago. Thoughts? --eric -- Eric D. Mudama edmud...@mail.bounceswoosh.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS getting slower over time
The dips are gone. I've run simple copy operations via CIFS for two days and the problem hasn't appeared anymore. I'll try to find out what caused it though, thanks for trying to help me. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss