Re: [zfs-discuss] Has anyone used a Dell with a PERC H310?
I am currently trying to get two of these things running Illumian. I don't have any particular performance requirements, so I'm thinking of using some sort of supported hypervisor, (either RHEL and KVM or VMware ESXi) to get around the driver support issues, and passing the disks through to an Illumian guest. The H310 does indeed support pass-through (the non-raid mode), but one thing to keep in mind is that I was only able to configure a single boot disk. I configured the rear two drives into a hardware raid 1 and set the virtual disk as the boot disk so that I can still boot the system if an OS disk fails. Once Illumos is better supported on the R720 and the PERC H310, I plan to get rid of the hypervisor silliness and run Illumos on bare metal. -Greg Sent from my iPhone ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Group Quotas
> > Also the linux NFSv4 client is bugged (as in hang-the-whole-machine bugged). > I am deploying a new osol fileserver for home directories and I'm using NFSv3 > + automounter (because I am also using one dataset per user, and thus I have > to mount each home dir separately). We are also in the same boat here. I have about 125TB of ZFS storage in production currently, running OSOL, across 5 X4540s. We tried the NFSv4 route, and crawled back to NFSv3 and the linux automounter because NFSv4 on Linux is *that* broken. As in hung-disk-io-that-wedges-the-whole-box broken. We know that NFSv3 was never meant for the scale we're using it at, but we have no choice in the matter. On the topic of Linux clients, NFS and ZFS: We've also found that Linux is bad at handling lots of mounts/umounts. We will occasionally find a client where the automounter requested a mount, but it never actually completed. It'll show as mounted in /proc/mounts, but won't *actually* be mounted. A umount -f for the affected filesystem fixes this. On ~250 clients in an HPC environment, we'll see such an error every week or so. I'm hoping that recent versions of linux (i.e. RHEL 6) are a bit better at NFSv4, but i'm not holding my breath. -- Greg Mason HPC Administrator Michigan State University Institute for Cyber Enabled Research High Performance Computing Center web: www.icer.msu.edu email: gma...@msu.edu ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS flar image.
As an alternative, I've been taking a snapshot of rpool on the golden system, sending it to a file, and creating a boot environment from the archived snapshot on target systems. After fiddling with the snapshots a little, I then either appropriately anonymize the system or provide it with its identity. When it boots up, it's ready to go. The only downfall to my method is that I still have to run the full OpenSolaris installer, and I can't exclude anything in the archive. Essentially, it's a poor man's flash archive. -Greg cindy.swearin...@sun.com wrote: Hi RB, We have a draft of the ZFS/flar image support here: http://opensolaris.org/os/community/zfs/boot/flash/ Make sure you review the Solaris OS requirements. Thanks, Cindy On 09/14/09 11:45, RB wrote: Is it possible to create flar image of ZFS root filesystem to install it to other macines? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Ssd for zil on a dell 2950
How about the bug "removing slog not possible"? What if this slog fails? Is there a plan for such situation (pool becomes inaccessible in this case)? You can "zpool replace" a bad slog device now. -Greg ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Ssd for zil on a dell 2950
Something our users do quite a bit of is untarring archives with a lot of small files. Also, many small, quick writes are also one of the many workloads our users have. Real-world test: our old Linux-based NFS server allowed us to unpack a particular tar file (the source for boost 1.37) in around 2-4 minutes, depending on load. This machine wasn't special at all, but it had fancy SGI disk on the back end, and was using the Linux-specific async NFS option. We turned up our X4540s, and this same tar unpack took over 17 minutes! We disabled the ZIL for testing, and we dropped this to under 1 minute. With the X25-E as a slog, we were able to run this test in 2-4 minutes, same as the old storage. That said, I strongly recommend using Richard Elling's zilstat. He's posted about it previously on this list. It will help you determine if adding a slog device will help your workload or not. I didn't know about this script at the time of our testing, so it ended up being some trial and error, running various tests on different hardware setups (which means creating and destroying quite a few pools). -Greg Jorgen Lundman wrote: Does un-taring something count? It is what I used for our tests. I tested with ZIL disable, zil cache on /tmp/zil, CF-card (300x) and cheap SSD. Waiting for X-25E SSDs to arrive for testing those: http://mail.opensolaris.org/pipermail/zfs-discuss/2009-July/030183.html If you want a quick answer, disable ZIL (you need to unmount/mount, export/import or reboot) on your ZFS volume and try it. That is the theoretical maximum. You can get close to this using various technologies, SSD and all that. I am no expert on this, I knew nothing about it 2 weeks ago. But for our provisioning engine to untar Movable-Types for customers, 5 mins to 45secs is quite an improvement. I can get that to 11seconds theoretically. (ZIL disable) Lund Monish Shah wrote: Hello Greg, I'm curious how much performance benefit you gain from the ZIL accelerator. Have you measured that? If not, do you have a gut feel about how much it helped? Also, for what kind of applications does it help? (I know it helps with synchronous writes. I'm looking for real world answers like: "Our XYZ application was running like a dog and we added an SSD for ZIL and the response time improved by X%.") Of course, I would welcome a reply from anyone who has experience with this, not just Greg. Monish - Original Message - From: "Greg Mason" To: "HUGE | David Stahl" Cc: "zfs-discuss" Sent: Thursday, August 20, 2009 4:04 AM Subject: Re: [zfs-discuss] Ssd for zil on a dell 2950 Hi David, We are using them in our Sun X4540 filers. We are actually using 2 SSDs per pool, to improve throughput (since the logbias feature isn't in an official release of OpenSolaris yet). I kind of wish they made an 8G or 16G part, since the 32G capacity is kind of a waste. We had to go the NewEgg route though. We tried to buy some Sun-branded disks from Sun, but that's a different story. To summarize, we had to buy the NewEgg parts to ensure a project stayed on-schedule. Generally, we've been pretty pleased with them. Occasionally, we've had an SSD that wasn't behaving well. Looks like you can replace log devices now though... :) We use the 2.5" to 3.5" SATA adapter from IcyDock, in a Sun X4540 drive sled. If you can attach a standard sata disk to a Dell sled, this approach would most likely work for you as well. Only issue with using the third-party parts is that the involved support organizations for the software/hardware will make it very clear that such a configuration is quite unsupported. That said, we've had pretty good luck with them. -Greg ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Ssd for zil on a dell 2950
Hi David, We are using them in our Sun X4540 filers. We are actually using 2 SSDs per pool, to improve throughput (since the logbias feature isn't in an official release of OpenSolaris yet). I kind of wish they made an 8G or 16G part, since the 32G capacity is kind of a waste. We had to go the NewEgg route though. We tried to buy some Sun-branded disks from Sun, but that's a different story. To summarize, we had to buy the NewEgg parts to ensure a project stayed on-schedule. Generally, we've been pretty pleased with them. Occasionally, we've had an SSD that wasn't behaving well. Looks like you can replace log devices now though... :) We use the 2.5" to 3.5" SATA adapter from IcyDock, in a Sun X4540 drive sled. If you can attach a standard sata disk to a Dell sled, this approach would most likely work for you as well. Only issue with using the third-party parts is that the involved support organizations for the software/hardware will make it very clear that such a configuration is quite unsupported. That said, we've had pretty good luck with them. -Greg -- Greg Mason System Administrator High Performance Computing Center Michigan State University HUGE | David Stahl wrote: We have a setup with ZFS/ESX/NFS and I am looking to move our zil to a solid state drive. So far I am looking into this one http://www.newegg.com/Product/Product.aspx?Item=N82E16820167013 Does anyone have any experience with this drive as a poorman’s logzilla? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] unexpected behavior with "nbmand=on" set
It's not too often we see good news on the zfs-discuss list, so here's some: We at the High Performance Computing Center at MSU have finally worked out the root cause of a long-standing issue with our OpenSolaris NFS servers. It was a minor configuration issue, involving a ZFS file system property. A little backstory: We chose to go with Sun X4540s, running OpenSolaris and ZFS for our home directory space. We initially implemented 100TB of usable space. All was well for a while, but then some mostly annoying issues started popping up: 1. 0-byte files named '4913' were appearing in user directories. We discovered that vi was doing: open("4913") close("4913") remove("4913") The remove() operation would fail intermittently. With assistance from the helpful folks at SGI (because we originally thought this was a Linux NFSv4 client problem), testing revealed that this behavior is caused by the NFS server on Solaris occasionally returning NFS4ERR_FILE_OPEN, which is not handled by the client. According to a Linux NFS kernel developer, "the error is usually due to ordering issues with asynchronous RPC calls." http://www.linux-nfs.org/Linux-2.6.x/2.6.18/linux-2.6.18-068-handle_nfs4err_file_open.dif We applied a patch to the Linux NFSv4 client, which told the client to wait and retry when the client received that error. 2. There was also an issue with gedit. When opening then saving an already existing file, it did: open("file") rename("file","file~") rename() returned "Input/Output Error." After applying the fix for #1, rename() hung indefinitely. We also noticed a similar problem with gcc. Interestingly, running this test locally on the OpenSolaris server on same file system, this test resulted in a "permission denied" error. If we mounted this same file system over NFSv4 on another OpenSolaris system, we received the same "permission denied" error. Yesterday, we discovered the property 'nbmand' was set on the ZFS file systems in question. This was a leftover from our initial testing with Solaris CIFS. It was set because the documentation at http://dlc.sun.com/osol/docs/content/SSMBAG/managingsmbsharestm.html and http://204.152.191.100/wiki/index.php/Getting_Started_With_the_Solaris_CIFS_Service instructed that nbmand should be turned on when using CIFS. What isn't mentioned, however, is that nbmand can adversely affect the behavior of NFSv4 and even local file systems. The ZFS admin guide also states that nbmand applies only to CIFS clients, when it actually applies to NFSv4 clients as well as local file system access. I think nbmand is also a bit slow in releasing its locks, which explains the behavior of bug number 1. The only tests we've run so far show that the "slow" locking behavior goes away when nbmand is turned off. Would filing a bug about this slow behavior of nbmand be the correct thing to do at this point? If so, where is the proper place to file this bug? The OpenSolaris BugZilla is where I've been told these bug reports go to, but I'm not sure if this should be filed in bugs.opensolaris.org or not. Disabling nbmand on a test file system resolved both bugs, as well as other known issues that our users have been running into. All the various known issues this caused can be found at the MSU HPCC wiki: https://wiki.hpcc.msu.edu/display/Issues/Known+Issues, under "Home Directory file system." -Greg -- Greg Mason System Administrator High Performance Computing Center Michigan State University ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Shrinking a zpool?
What is the downtime for doing a send/receive? What is the downtime for zpool export, reconfigure LUN, zpool import? We have a similar situation. Our home directory storage is based on many X4540s. Currently, we use rsync to migrate volumes between systems, but our process could very easily be switched over to zfs send/receive (and very well may be in the near future). What this looks like, if using zfs send/receive, is we perform an initial send (get the bulk of the data over), and then at a planned downtime, do an incremental send to "catch up" the destination. This "catch up" phase is usually a very small fraction of the overall size of the volume. The only downtime required is from just before the final snapshot you send (the last incremental), and when the send finishes, and turning up whatever service(s) on the destination system. If the filesystem a lot of write activity, you can run multiple incrementals to decrease the size of that last snapshot. As far as backing out goes, you can simply destroy the destination filesystem, and continue running on the original system, if all hell breaks loose (of course that never happens, right? :) When everything checks out (which you can safely assume when the recv finishes, thanks to how ZFS send/recv works), you then just have to destroy the original fileystem. It is correct in that this doesn't shrink the pool, but it's at least a workaround to be able to swing filesystems around to different systems. If you had only one filesystem in the pool, you could then safely destroy the original pool. This does mean you'd need 2x the size of the LUN during the transfer though. For replication of ZFS filesystems, we a similar process, with just a lot of incremental sends. Greg Mason System Administrator High Performance Computing Center Michigan State University ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SSD's and ZFS...
> >> I think it is a great idea, assuming the SSD has good write performance. > > This one claims up to 230MB/s read and 180MB/s write and it's only $196. > > > > http://www.newegg.com/Product/Product.aspx?Item=N82E16820609393 > > > > Compared to this one (250MB/s read and 170MB/s write) which is $699. > > > Oops. Forgot the link: > > http://www.newegg.com/Product/Product.aspx?Item=N82E16820167014 > > Are those claims really trustworthy? They sound too good to be true! > > > > -Kyle Kyle- The less expensive SSD is an MLC device. The Intel SSD is an SLC device. That right there accounts for the cost difference. The SLC device (Intel X25-E) will last quite a bit longer than the MLC device. -Greg -- Greg Mason System Administrator Michigan State University High Performance Computing Center ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Question about user/group quotas
Thanks for the link Richard, I guess the next question is, how safe would it be to run snv_114 in production? Running something that would be technically "unsupported" makes a few folks here understandably nervous... -Greg On Thu, 2009-07-09 at 10:13 -0700, Richard Elling wrote: > Greg Mason wrote: > > I'm trying to find documentation on how to set and work with user and > > group quotas on ZFS. I know it's quite new, but googling around I'm just > > finding references to a ZFS quota and refquota, which are > > filesystem-wide settings, not per user/group. > > > > Cindy does an excellent job of keeping the ZFS Admin Guide up to date. > http://opensolaris.org/os/community/zfs/docs/zfsadmin.pdf > See the section titled "Setting User or Group Quotas on a ZFS File System" > -- richard > > Also, after reviewing a few bugs, I'm a bit confused about which build > > has user quota support. I recall that snv_111 has user quota support, > > but not in rquotad. According to bug 6501037, ZFS user quota support is > > in snv_114. > > > > We're preparing to roll out OpenSolaris 2009.06 (snv_111b), and we're > > also curious about being able to utilize ZFS user quotas, as we're > > having problems with NFSv4 on our clients (SLES 10 SP2). We'd like to be > > able to use NFSv3 for now (one large ZFS filesystem, with user quotas > > set), until the flaws with our Linux NFS clients can be addressed. > > > > > ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Question about user/group quotas
I'm trying to find documentation on how to set and work with user and group quotas on ZFS. I know it's quite new, but googling around I'm just finding references to a ZFS quota and refquota, which are filesystem-wide settings, not per user/group. Also, after reviewing a few bugs, I'm a bit confused about which build has user quota support. I recall that snv_111 has user quota support, but not in rquotad. According to bug 6501037, ZFS user quota support is in snv_114. We're preparing to roll out OpenSolaris 2009.06 (snv_111b), and we're also curious about being able to utilize ZFS user quotas, as we're having problems with NFSv4 on our clients (SLES 10 SP2). We'd like to be able to use NFSv3 for now (one large ZFS filesystem, with user quotas set), until the flaws with our Linux NFS clients can be addressed. -- Greg Mason System Administrator Michigan State University High Performance Computing Center ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] importing pool with missing slog followup
In my testing, I've seen that trying to duplicate zpool disks with dd often results in a disk that's unreadable. I believe it has something to do with the block sizes of dd. In order to make my own slog backups, I just used cat instead. I plugged the slog SSD into another system (not a necessary step, but easier in my case), catted the disk to a file, then put the slog SSD back. I imagine this needs to be done with the zpool in a cleanly-exported state, i haven't tested it otherwise. I've also tested replacing an SSD with my method, just cat the file back to the disk. I've tested this method of replacing a slog, and the zpool is imported on boot, like nothing happened, even though the physical hardware has changed. A question I have is, does "zpool replace" now work for slog devices as of snv_111b? -Greg On Fri, 2009-06-05 at 20:57 -0700, Paul B. Henson wrote: > My research into recovering from a pool whose slog goes MIA while the pool > is off-line resulted in two possible methods, one requiring prior > preparation and the other a copy of the zpool.cache including data for the > failed pool. > > The first method is to simply dump a copy of the slog device right after > you make it (just dd if=/dev/dsk/ of=slog.dump>). If the device ever > failed, theoretically you could restore the image onto a replacement (dd > if=slog.dump of=/dev/dsk/) and import the pool. > > My initial testing of that method was promising, however that testing was > performed by intentionally corrupting the slog device, and restoring the > copy back onto the original device. However, when I tried restoring the > slog dump onto a different device, that didn't work out so well. zpool > import recognized the different device as a log device for the pool, but > still complained there were unknown missing devices and refused to import > the pool. It looks like the device serial number is stored as part of the > zfs label, resulting in confusion when that label is restored onto a > different device. As such, this method is only usable if the underlying > fault is simply corruption, and the original device is available to restore > onto. > > The second method is described at: > > http://opensolaris.org/jive/thread.jspa?messageID=377018 > > Unfortunately, the included binary does not run under S10U6, and after half > an hour or so of trying to get the source code to compile under S10U6 I > gave up (I found some of the missing header files in the S10U6 grub source > code package which presumably match the actual data structures in use under > S10, but there was additional stuff missing which as I started copying it > out of opensolaris code just started getting messier and messier). Unless > someone with more zfs-fu than me creates a binary for S10, this approach is > not going to be viable. > > Unofficially I was told that there is expected to be a fix for this issue > putback into Nevada around July, but whether or not that might be available > in U8 wasn't said. So, barring any official release of a fix or unofficial > availability of a workaround for S10, in the (admittedly unlikely) failure > mode of a slog device failure on an inactive pool, have good backups :). > > ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [storage-discuss] Supermicro SAS/SATA controllers?
And it looks like the Intel fragmentation issue is fixed as well: http://techreport.com/discussions.x/16739 FYI, Intel recently had a new firmware release. IMHO, odds are that this will be as common as HDD firmware releases, at least for the next few years. http://news.cnet.com/8301-13924_3-10218245-64.html?tag=mncol It should also be noted that the Intel X25-M != the Intel X25-E. The X25-E hasn't had any of the performance and fragmentation issues. The X25-E is an SLC SSD, the X25-M is an MLC SSD, hence the more complex firmware. -Greg ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Data size grew.. with compression on
Harry, ZFS will only compress data if it is able to gain more than 12% of space by compressing the data (I may be wrong on the exact percentage). If ZFS can't get get that 12% compression at least, it doesn't bother and will just store the block uncompressed. Also, the default ZFS compression algorithm isn't gzip, so you aren't going to get the greatest compression possible, but it is quite fast. Depending on the type of data, it may not compress well at all, leading ZFS to store that data completely uncompressed. -Greg All good info thanks. Still one thing doesn't quite work in your line of reasoning. The data on the gentoo linux end is uncompressed. Whereas it is compressed on the zfs side. A number of the files are themselves compressed formats such as jpg mpg avi pdf maybe a few more, which aren't going to compress further to speak of, but thousands of the files are text files (html). So compression should show some downsize. Your calculation appears to be based on both ends being uncompressed. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs as a cache server
Francois, Your best bet is probably a stripe of mirrors. i.e. a zpool made of many mirrors. This way you have redundancy, and fast reads as well. You'll also enjoy pretty quick resilvering in the event of a disk failure as well. For even faster reads, you can add dedicated L2ARC cache devices (folks typically use SSDs for very fast (15k RPM) SAS drives for this). -Greg Francois wrote: Hello list, What would be the best zpool configuration for a cache/proxy server (probably based on squid) ? In other words with which zpool configuration I could expect best reading performance ? (there'll be some writes too but much less). Thanks. -- Francois ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] GSoC 09 zfs ideas?
Just my $0.02, but would pool shrinking be the same as vdev evacuation? I'm quite interested in vdev evacuation as an upgrade path for multi-disk pools. This would be yet another reason to for folks to use ZFS at home (you only have to buy cheap disks), but it would also be a good to have that ability from an enterprise perspective, as I'm sure we've all engineered ourselves into a corner one time or another... It's a much cleaner, safer, and possibly much faster alternative to systematically pulling drives and letting zfs resilver onto a larger disk, in order to upgrade a pool in-place, and in production. basically, what I'm thinking is: zpool remove mypool Allow time for ZFS to vacate the vdev(s), and then light up the "OK to remove" light on each evacuated disk. -Greg Blake Irvin wrote: Shrinking pools would also solve the right-sizing dilemma. Sent from my iPhone On Feb 28, 2009, at 3:37 AM, Joe Esposito wrote: I'm using opensolaris and zfs at my house for my photography storage as well as for an offsite backup location for my employer and several side web projects. I have an 80g drive as my root drive. I recently took posesion of 2 74g 10k drives which I'd love to add as a mirror to replace the 80 g drive. From what I gather it is only possible if I zfs export my storage array and reinstall solaris on the new disks. So I guess I'm hoping zfs shrink and grow commands show up sooner or later. Just a data point. Joe Esposito www.j-espo.com On 2/28/09, "C. Bergström" wrote: Blake wrote: Gnome GUI for desktop ZFS administration On Fri, Feb 27, 2009 at 9:13 PM, Blake wrote: zfs send is great for moving a filesystem with lots of tiny files, since it just handles the blocks :) I'd like to see: pool-shrinking (and an option to shrink disk A when i want disk B to become a mirror, but A is a few blocks bigger) This may be interesting... I'm not sure how often you need to shrink a pool though? Could this be classified more as a Home or SME level feature? install to mirror from the liveCD gui I'm not working on OpenSolaris at all, but for when my projects installer is more ready /we/ can certainly do this.. zfs recovery tools (sometimes bad things happen) Agreed.. part of what I think keeps zfs so stable though is the complete lack of dependence on any recovery tools.. It forces customers to bring up the issue instead of dirty hack and nobody knows. automated installgrub when mirroring an rpool This goes back to an installer option? ./C ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Write caches on X4540
well, since the write cache flush command is disabled, I would like this to happen as early as practically possible in the bootup process, as ZFS will not be issuing the cache flush commands to the disks. I'm not really sure what happens in the case where the write flush command is disabled, something makes its way into the write cache, then the cache is disabled. Does this mean the write cache is flushed to disk when the cache is disabled? If so, then I guess it's less critical when it happens in the bootup process or if it's permanent... -Greg A Darren Dunham wrote: On Thu, Feb 12, 2009 at 10:33:40AM -0500, Greg Mason wrote: What I'm looking for is a faster way to do this than format -e -d -f
Re: [zfs-discuss] Write caches on X4540
Are you sure thar write cache is back on after restart? Yes, I've checked with format -e, on each drive. When disabling the write cache with format, it also gives a warning stating this is the case. What I'm looking for is a faster way to do this than format -e -d -f
Re: [zfs-discuss] Write caches on X4540
We use several X4540's over here as well, what type of workload do you have, and how much performance increase did you see by disabling the write caches? We see the difference between our tests completing in around 2.5 minutes (with write caches) to around a minute an and a half without them, in one instance. I'm trying to optimize our machines for a write-heavy environment, as our users will undoubtedly hit this limitation of the machines. -Greg ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Write caches on X4540
We're using some X4540s, with OpenSolaris 2008.11. According to my testing, to optimize our systems for our specific workload, I've determined that we get the best performance with the write cache disabled on every disk, and with zfs:zfs_nocacheflush=1 set in /etc/system. The only issue is setting the write cache permanently, or at least quickly. Right now, as it is, I've scripted up format to run on boot, disabling the write cache of all disks. This takes around two minutes. I'd like to avoid needing to take this time on every bootup (which is more often than you'd think, we've got quite a bit of construction happening, which necessitates bringing everything down periodically). This would also be painful in the event of unplanned downtime for one of our Thors. so, basically, my question is: Is there a way to quickly or permanently disable the write cache on every disk in an X4540? Thanks, -Greg ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Send & Receive (and why does 'ls' modify a snapshot?)
Tony, I believe you want to use "zfs recv -F" to force a rollback on the receiving side. I'm wondering if your ls is updating the atime somewhere, which would indeed be a change... -Greg ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Add SSD drive as L2ARC(?) cache to existing ZFS raid?
Orvar, With my testing, i've seen a 5x improvement with small file creation when working specifically with NFS. This is after I added an SSD for the ZIL. I recommend Richard Elling's zilstat (he posted links earlier). It'll let you see if a dedicated device for the ZIL will help your specific workload. My understanding is that you'll get "more bang for the buck" using an SSD for the ZIL rather than the L2ARC. Performing some of your own benchmarks is really the only way see what will help improve performance for your specific workload. I recommend reading up on the ZFS ARC and L2ARC, to help try to determine if testing a dedicated L2ARC device is even worthwhile for your uses. I know it wasn't really helpful for me, as our read performance is already great. As for a specific SSD, I've tested the Intel X25E. It's around $600 or so. It's got about half the performance of the snazzy, pricey STEC Zeus drives. With the specific workload I was trying to accelerate, I wasn't hitting any of the limits of the Intel SSDs (but I was definitely WAY past the performance limits of a standard hard disk). Again, all of this was for accelerating the ZIL, not for use on the L2ARC, so YMMV. Fishworks does this. They use an SSD both for the read cache as well as the ZIL. -Greg Orvar Korvar wrote: > So are there no guide lines how to add a SSD disk as a home user? Which is > the best SSD disk to add? What percentage improvements are typical? Or, will > a home user not benefit from adding a SSD drive? It is only enterprise SSD > drives that works, together with some esoteric software from Fishworks? It > requires Enterprise hardware to get a boost from SSD? Not possible? Or? > > No one has done this yet? What does the Fishworks team say? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] write cache and cache flush
I'll give this a script a shot a little bit later today. For ZIL sizing, I'm using either 1 or 2 32G Intel X25-E SSDs in my tests, which, according to what I've read, is 2-4 times larger than the maximum that ZFS can possibly use. We've got 32G of system memory in these Thors, and (if I'm not mistaken) the maximum amount of in-play data can be 16G, 1/2 the system memory. Also, because I know people will be asking, has anybody ever tried to recover from something like a system crash with a ZFS pool that has the ZIL disabled? What kind of nightmares would I be facing in such a situation? Would I simply just risk losing that in-play data, or could more serious things happen? I know disabling the ZIL is an Extremely Bad Idea, but I need to tell people exactly why... -Greg Jim Mauro wrote: > You have SSD's for the ZIL (logzilla) enabled, and ZIL IO > is what is hurting your performance...Hmmm > > I'll ask the stupid question (just to get it out of the way) - is > it possible that the logzilla is undersized? > > Did you gather data using Richard Elling's zilstat (included below)? > > Thanks, > /jim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] write cache and cache flush
> If there was a latency issue, we would see such a problem with our > existing file server as well, which we do not. We'd also have much > greater problems than just file server performance. > > So, like I've said, we've ruled out the network as an issue. I should also add that I've tested these Thors with the ZIL disabled, and they scream! With the cache flush disabled, they also do quite well. The specific issue i'm trying to solve is the ZIL being slow when using NFS. I really don't want to have to do something drastic like disabling the ZIL to get the performance I need... ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] write cache and cache flush
Jim Mauro wrote: > >> This problem only manifests itself when dealing with many small files >> over NFS. There is no throughput problem with the network. > But there could be a _latency_ issue with the network. If there was a latency issue, we would see such a problem with our existing file server as well, which we do not. We'd also have much greater problems than just file server performance. So, like I've said, we've ruled out the network as an issue. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] write cache and cache flush
7200 RPM SATA disks. Tim wrote: > > > On Fri, Jan 30, 2009 at 8:24 AM, Greg Mason <mailto:gma...@msu.edu>> wrote: > > A Linux NFS file server, with a few terabytes of fibre-attached disk, > using XFS. > > I'm trying to get these Thors to perform at least as well as the current > setup. A performance hit is very hard to explain to our users. > > > What type of spindles were in the FC attached disk? > > --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] write cache and cache flush
I should also add that this "creating many small files" issue is the ONLY case where the Thors are performing poorly, which is why I'm focusing on it. Greg Mason wrote: > A Linux NFS file server, with a few terabytes of fibre-attached disk, > using XFS. > > I'm trying to get these Thors to perform at least as well as the current > setup. A performance hit is very hard to explain to our users. > >> Perhaps I missed something, but what was your previous setup? >> I.e. what did you upgrade from? >> Neil. >> >> > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] write cache and cache flush
A Linux NFS file server, with a few terabytes of fibre-attached disk, using XFS. I'm trying to get these Thors to perform at least as well as the current setup. A performance hit is very hard to explain to our users. > Perhaps I missed something, but what was your previous setup? > I.e. what did you upgrade from? > Neil. > > ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] write cache and cache flush
This problem only manifests itself when dealing with many small files over NFS. There is no throughput problem with the network. I've run tests with the write cache disabled on all disks, and the cache flush disabled. I'm using two Intel SSDs for ZIL devices. This setup is faster than using the two Intel SSDs with write caches enabled on all disks, and with the cache flush enabled. My test would run around 3.5 to 4 minutes, now it is completing in abound 2.5 minutes. I still think this is a bit slow, but I still have quite a bit of testing to perform. I'll keep the list updated with my findings. I've already established both via this list and through other research that ZFS has performance issues over NFS when dealing with many small files. This seems to maybe be an issue with NFS itself, where NVRAM-backed storage is needed for decent performance with small files. Typically such an NVRAM cache is supplied by a hardware raid controller in a disk shelf. I find it very hard to explain to a user why an "upgrade" is a step down in performance. For the users these Thors are going to serve, such a drastic performance hit is a deal breaker... I've done my homework on this issue, I've ruled out the network as an issue, as well as the NFS clients. I've narrowed my particular performance issue down to the ZIL, and how well ZFS plays with NFS. -Greg Jim Mauro wrote: > Multiple Thors (more than 2?), with performance problems. > Maybe it's the common demnominator - the network. > > Can you run local ZFS IO loads and determine if performance > is expected when NFS and the network are out of the picture? > > Thanks, > /jim > ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] write cache and cache flush
the funny thing is that I'm showing a performance improvement over write caches + cache flushes. The only way these pools are being accessed is over NFS. Well, at least the only way I care about when it comes to high performance. I'm pretty sure it would give a performance hit locally, but I don't care about local disk performance, I only care about the performance over NFS. Anton B. Rang wrote: > If all write caches are truly disabled, then disabling the cache flush won't > affect the safety of your data. > > It will change your performance characteristics, almost certainly for the > worse. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] write cache and cache flush
So, I'm still beating my head against the wall, trying to find our performance bottleneck with NFS on our Thors. We've got a couple Intel SSDs for the ZIL, using 2 SSDs as ZIL devices. Cache flushing is still enabled, as are the write caches on all 48 disk devices. What I'm thinking of doing is disabling all write caches, and disabling the cache flushing. What would this mean for the safety of data in the pool? And, would this even do anything to address the performance issue? -Greg ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Add SSD drive as L2ARC(?) cache to existing ZFS raid?
How were you running this test? were you running it locally on the machine, or were you running it over something like NFS? What is the rest of your storage like? just direct-attached (SAS or SATA, for example) disks, or are you using a higher-end RAID controller? -Greg kristof wrote: > Kebabber, > > You can't expose zfs filesystems over iSCSI. > > You only can expose ZFS volumes (raw volumes) over iscsi. > > PS: 2 weeks ago I did a few tests, using filebench. > > I saw little to no improvement using a 32GB Intel X25E SSD. > > Maybe this is because filebench is flushing the cache in between tests. > > I also compared iscsi boot time (using gpxe as boot loader) , > > We are using raidz storagepool (4disks). here again, adding the X25E as cache > device did not speedup the boot proccess. So I did not see real improvement. > > PS: We have 2 master volumes (xp and vista) which we clone to provision > additional guests. > > I'm now waiting for new SSD disks (STEC Zeus 18GB en STEC Mach 100GB.), since > those are used in SUN 7000 product. I hope they perform better. > > Kristof ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SSD drives in Sun Fire X4540 or X4500 for dedicated ZIL device
If i'm not mistaken (and somebody please correct me if i'm wrong), the Sun 7000 series storage appliances (the Fishworks boxes) use enterprise SSDs, with dram caching. One such product is made by STEC. My understanding is that the Sun appliances use one SSD for the ZIL, and one as a read cache. For the 7210 (which is basically a Sun Fire X4540), that gives you 46 disks and 2 SSDs. -Greg Bob Friesenhahn wrote: > On Thu, 22 Jan 2009, Ross wrote: > >> However, now I've written that, Sun use SATA (SAS?) SSD's in their >> high end fishworks storage, so I guess it definately works for some >> use cases. > > But the "fishworks" (Fishworks is a development team, not a product) > write cache device is not based on FLASH. It is based on DRAM. The > difference is like night and day. Apparently there can also be a read > cache which is based on FLASH. > > Bob > == > Bob Friesenhahn > bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ > GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ > > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] SSD drives in Sun Fire X4540 or X4500 for dedicated ZIL device
We're evaluating the possibility of speeding up NFS operations of our X4540s with dedicated log devices. What we are specifically evaluating is replacing 1 or two of our spare sata disks with sata SSDs. Has anybody tried using SSD device(s) as dedicated ZIL devices in a X4540? Are there any known technical issues with using a SSD in a X4540? -Greg ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS over NFS, poor performance with many small files
> > Good idea. Thor has a CF slot, too, if you can find a high speed > CF card. > -- richard We're already using the CF slot for the OS. We haven't really found any CF cards that would be fast enough anyways :) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS over NFS, poor performance with many small files
So, what we're looking for is a way to improve performance, without disabling the ZIL, as it's my understanding that disabling the ZIL isn't exactly a safe thing to do. We're looking for the best way to improve performance, without sacrificing too much of the safety of the data. The current solution we are considering is disabling the cache flushing (as per a previous response in this thread), and adding one or two SSD log devices, as this is similar to the Sun storage appliances based on the Thor. Thoughts? -Greg On Jan 19, 2009, at 6:24 PM, Richard Elling wrote: >> >> We took a rough stab in the dark, and started to examine whether or >> not it was the ZIL. > > It is. I've recently added some clarification to this section in the > Evil Tuning Guide which might help you to arrive at a better solution. > http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Disabling_the_ZIL_.28Don.27t.29 > Feedback is welcome. > -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS over NFS, poor performance with many small files
We're running into a performance problem with ZFS over NFS. When working with many small files (i.e. unpacking a tar file with source code), a Thor (over NFS) is about 4 times slower than our aging existing storage solution, which isn't exactly speedy to begin with (17 minutes versus 3 minutes). We took a rough stab in the dark, and started to examine whether or not it was the ZIL. Performing IO tests locally on the Thor shows no real IO problems, but running IO tests over NFS, specifically, with many smaller files we see a significant performance hit. Just to rule in or out the ZIL as a factor, we disabled it, and ran the test again. It completed in just under a minute, around 3 times faster than our existing storage. This was more like it! Are there any tunables for the ZIL to try to speed things up? Or would it be best to look into using a high-speed SSD for the log device? And, yes, I already know that turning off the ZIL is a Really Bad Idea. We do, however, need to provide our users with a certain level of performance, and what we've got with the ZIL on the pool is completely unacceptable. Thanks for any pointers you may have... -- Greg Mason Systems Administrator Michigan State University High Performance Computing Center ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Using ZFS for replication
zfs-auto-snapshot (SUNWzfs-auto-snapshot) is what I'm using. Only trick is that on the other end, we have to manage our own retention of the snapshots we send to our offsite/backup boxes. zfs-auto-snapshot can handle the sending of snapshots as well. We're running this in OpenSolaris 2008.11 (snv_100). Another use I've seen is using zfs-auto-snapshot to take and manage snapshots on both ends, using rsync to replicate the data, but that's less than ideal for most folks... -Greg Ian Mather wrote: > Fairly new to ZFS. I am looking to replicate data between two thumper boxes. > Found quite a few articles about using zfs incremental snapshot send/receive. > Just a cheeky question to see if anyone has anything working in a live > environment and are happy to share the scripts, save me reinventing the > wheel. thanks in advance. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss