[zfs-discuss] zfs fragmentation
1. Due to the COW nature of zfs, files on zfs are more tender to be fragmented comparing to traditional file system. Is this statement correct? 2. If so, common understanding is that fragmentation cause perform degradation, will zfs or to what extend zfs performance is affected by the fragmentation? 3. Being a relative new file system, are there many adoption in large implementation? 4. Googing zfs fragmentation doesn't return a lot results. It can because either there isn't a lot major adoption of zfs or fragment isn't a really problem for zfs. Any information is appreciated. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Pool iscsi /zfs performance in opensolaris 0906
On 7 août 09, at 02:03, Stephen Green wrote: I used a 2GB ram disk (the machine has 12GB of RAM) and this jumped the backup up to somewhere between 18-40MB/s, which means that I'm only a couple of hours away from finishing my backup. This is, as far as I can tell, magic (since I started this message nearly 10GB of data have been transferred, when it took from 6am this morning to get to 20GB.) It transfer speed drops like crazy when the write to disk happens, but it jumps right back up afterwards. If you want to perhaps reuse the slog later (ram disks are not preserved over reboot) write the slog volume out to disk and dump it back in after restarting. dd if=/dev/ramdisk/slog of=/root/slog.dd Now my only question is: what do I do when it's done? If I reboot and the ram disk disappears, will my tank be dead? Or will it just continue without the slog? I realize that I'm probably totally boned if the system crashes, so I'm copying off the stuff that I really care about to another pool (the Mac's already been backed up to a USB drive.) Have I meddled in the affairs of wizards? Is ZFS subtle and quick to anger? You have a number of options to preserve the current state of affairs and be able to reboot the OpenSolaris server if required. The absolute safest bet would be the following, but the resilvering will take a while before you'll be able to shutdown: create a file of the same size of the ramdisk on the rpool volume replace the ramdisk slog with the 2G file (zpool replace poolname / dev/ramdisk/slog /root/slogtemp) wait for the resilver/replacement operation to run its course reboot create a new ramdisk (same size, as always) replace the file slog with the newly created ramdisk If your machine reboots unexpectedly things are a little dicier, but you should still be able to get things back online. If you did a dump of the ramdisk via dd to a file it should contain the correct signature and be recognized by ZFS. Now there will be no guarantees to the state of the data since if there was anything actively used on the ramdisk when it stopped you'll lose data and I'm not sure how the pool will deal with this. But in a pinch, you should be able to either replace the missing ramdisk device with the dd file copy of the ramdisk (make a copy first, just in case) or mount a new ramdisk, and dd the contents of the file back to the device and then import the pool. Cheers, Erik ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] limiting the ARC cache during early boot, without /etc/system
Besides the /etc/system, you could also export all the pools, use mdb to set the same variable that /etc/system sets, and then import the pools again. Don't know of any other mechanism to limit ZFS's memory foot print. If you don't do ZFS boot, manually import the pools after the application starts, so you get your pages first. Sounds good... except this is OpenSolaris distro we're talking about so I have ZFS root with no other options. It'll always have at least the rpool. Good thought though! - Matt -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Can I setting 'zil_disable' to increase ZFS/iscsi performance ?
Yes, but to see if a separate ZIL will make a difference the OP should try his iSCSI workload first with ZIL then temporarily disable ZIL and re-try his workload. or you may use the zilstat utility -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] changing SATA ports
I've a new MB (tyhe same as before butthis one works..) and I want to change the way my SATA drives were connected. I had a ZFS boot mirror conncted to SATA3 and 4 and I wat those drives to be on SATA1 and 2 now. Question: will ZFS see this and boot the system OK or will I have to take some precautions beforehand? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Pool iscsi /zfs performance in opensolaris 0906
erik.ableson wrote: On 7 août 09, at 02:03, Stephen Green wrote: Man, that looks so nice I think I'll change my mail client to do dates in French :-) Now my only question is: what do I do when it's done? If I reboot and the ram disk disappears, will my tank be dead? Or will it just continue without the slog? I realize that I'm probably totally boned if the system crashes, so I'm copying off the stuff that I really care about to another pool (the Mac's already been backed up to a USB drive.) You have a number of options to preserve the current state of affairs and be able to reboot the OpenSolaris server if required. The absolute safest bet would be the following, but the resilvering will take a while before you'll be able to shutdown: create a file of the same size of the ramdisk on the rpool volume replace the ramdisk slog with the 2G file (zpool replace poolname /dev/ramdisk/slog /root/slogtemp) wait for the resilver/replacement operation to run its course reboot create a new ramdisk (same size, as always) replace the file slog with the newly created ramdisk Would having an slog as a file on a different pool provide anywhere near the same improvement that I saw by adding a ram disk? Would it affect the typical performance (i.e., reading and writing files in my editor) adversely? That is, could I move the slog to a file and then just leave it there so that I don't have trouble across reboots? I could then just use the ramdisk when big things happened on the MacBook. If your machine reboots unexpectedly things are a little dicier, but you should still be able to get things back online. If you did a dump of the ramdisk via dd to a file it should contain the correct signature and be recognized by ZFS. Now there will be no guarantees to the state of the data since if there was anything actively used on the ramdisk when it stopped you'll lose data and I'm not sure how the pool will deal with this. But in a pinch, you should be able to either replace the missing ramdisk device with the dd file copy of the ramdisk (make a copy first, just in case) or mount a new ramdisk, and dd the contents of the file back to the device and then import the pool. So, I take it if I just do a shutdown, the slog will be emptied appropriately to the pool, but then at startup the slog device will be missing and the system won't be able to import that pool. If I dd the ramdisk to a file, I suppose that I should use a file on my rpool, right? Thanks for the advice, I think it might be time to convince the wife that I need to buy an SSD. Anyone have recommendations for a reasonably priced SSD for a home box? Steve ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] MISTAKE in Evil_Tuning_Guide - FLUSH
Who do we contact to fix mis-information in the evil tuning guide? at: http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#How_to_Tune_Cache_Sync_Handling_Per_Storage_Device Item 2 indicates SPARC uses file name ssd.conf and X64 uses sd.conf to insert a line sd-config-list After doing this for Hitachi San luns, we still had peformance issues. Sun support Incident 71249590 indicated that for ssd.conf, the token must be ssd-config-list not sd-config-list (begin with ssd vice sd). This solved our problem and we are not sending the cache flush to the Hitachi San anymore. Anyone know who to contact to get the documentation fixed? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Pool iscsi /zfs performance in opensolaris 0906
Stephen Green wrote: Thanks for the advice, I think it might be time to convince the wife that I need to buy an SSD. Anyone have recommendations for a reasonably priced SSD for a home box? For example, does anyone know if something like: http://www.newegg.com/Product/Product.aspx?Item=N82E16820227436 manufacturers homepage: http://www.ocztechnology.com/products/solid_state_drives/ocz_minipci_express_ssd-sata_ would work in OpenSolaris? It (apparently) just looks like a SATA disk on the PCIe bus, and the package that they ship it in doesn't look big enough to have a driver disk in it (and the manufacturer doesn't provide drivers on their Web site.) Compatibility aside, would a 16GB SSD on a SATA port be a good solution to my problem? My box is a bit shy on SATA ports, but I've got lots of PCI ports. Should I get two? It's only $60, so not such a troublesome sell to my wife. Steve ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] changing SATA ports
On Fri, Aug 7, 2009 at 8:49 AM, Dick Hoogendijk d...@nagual.nl wrote: I've a new MB (tyhe same as before butthis one works..) and I want to change the way my SATA drives were connected. I had a ZFS boot mirror conncted to SATA3 and 4 and I wat those drives to be on SATA1 and 2 now. Question: will ZFS see this and boot the system OK or will I have to take some precautions beforehand? You need to update grub if you're going to change the ports the boot drives are plugged into. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs fragmentation
On Thu, 6 Aug 2009, Hua wrote: 1. Due to the COW nature of zfs, files on zfs are more tender to be fragmented comparing to traditional file system. Is this statement correct? Yes and no. Fragmentation is a complex issue. ZFS uses 128K data blocks by default whereas other filesystems typically use 4K or 8K blocks. This naturally reduces the potential for fragmentation by 32X over 4k blocks. ZFS storage pools are typically comprised of multiple vdevs and writes are distributed over these vdevs. This means that the first 128K of a file may go to the first vdev and the second 128K may go to the second vdev. It could be argued that this is a type of fragmentation but since all of the vdevs can be read at once (if zfs prefetch chooses to do so) the seek time for single-user contiguous access is essentially zero since the seeks occur while the application is already busy processing other data. When mirror vdevs are used, any device in the mirror may be used to read the data. ZFS uses a slab allocator and allocates large contiguous chunks of from the vdev storage, and then carves the 128K blocks from those large chunks. This dramatically increases the probability that related data will be very close on the same disk. ZFS delays ordinary writes to the very last minute according to these rules (my understanding): 7/8th total memory consumed, 5 seconds of 100% write I/O is collected, or 30 seconds has elapsed. Since quite a lot of data is written at once, zfs is able to write that data in the best possible order. ZFS uses a copy-on-write model. Copy-on-write tends to cause fragmentation if portions of existing files are updated. If a large portion of a file is overwritten in a short period of time, the result should be reasonably fragment-free but if parts of the file are updated over a long period of time (like a database) then the file is certain to be fragmented. This is not such a big problem as it appears to be since such files were already typically accessed using random access. ZFS absolutely observes synchronous write requests (e.g. by NFS or a database). The synchronous write requests do not benefit from the long write aggregation delay so the result may not be written as ideally as ordinary write requests. Recently zfs has added support for using a SSD as a synchronous write log, and this allows zfs to turn synchronous writes into more ordinary writes which can be written more intelligently while returning to the user with minimal latency. Perhaps the most significant fragmentation concern for zfs is if the pool is allowed to become close to 100% full. Similar to other filesystems, the quality of the storage allocations goes downhill fast when the pool is almost 100% full, so even files written contiguously may be written in fragments. 3. Being a relative new file system, are there many adoption in large implementation? There are indeed some sites which heavily use zfs. One very large site using zfs is archive.org. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Pool Layout Advice Needed
On 6-Aug-09, at 11:32 , Thomas Burgess wrote: i've seen some people use usb sticks, and in practice it works on SOME machines. The biggest difference is that the bios has to allow for usb booting. Most of todays computers DO. Personally i like compact flash because it is fairly easy to use as a cheap alternative to a hard drive. I mirror the cf drives exactly like they are hard drives so if one fails i just replace it. USB is a little harder to do that with because they are just not as consistent as compact flash. But honestly it should work and many people do this. I've ended up purchasing two 8GB CF cards and the required CF-SATA adapters. How, once I install OpenSolaris on the system using the two CF cards as a mirrored ZFS root pool, can I leverage any of the free space for some kind of ZFS specific performance improvement? slog? etc? Thanks for everyone's input! A. -- Adam Sherman CTO, Versature Corp. Tel: +1.877.498.3772 x113 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] MISTAKE in Evil_Tuning_Guide - FLUSH
Sweet! Thanks! You rock! -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Pool iscsi /zfs performance in opensolaris 0906
Note - this has a mini PCIe interface, not PCIe. I had the 64GB version in a Dell Mini 9. While it was great for it's small size, low power and low heat characteristics (no fan on the Mini 9!), it was only faster than the striped sata drives in my mac pro when it came to random reads. Everything else was slower, sometimes by a lot, as measured by XBench. Unfortunately I no longer have the numbers to share. I see the sustained writes listed as up to 25 MB/s, and bursts up to 51 MB/s. That said, I have read of people having good luck with fast CF cards (no ref, sorry). So maybe this will be just fine :) -Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs fragmentation
ZFS absolutely observes synchronous write requests (e.g. by NFS or a database). The synchronous write requests do not benefit from the long write aggregation delay so the result may not be written as ideally as ordinary write requests. Recently zfs has added support for using a SSD as a synchronous write log, and this allows zfs to turn synchronous writes into more ordinary writes which can be written more intelligently while returning to the user with minimal latency. Bob, since the ZIL is used always, whether a separate device or not, won't writes to a system without a separate ZIL also be written as intelligently as with a separate ZIL? Thanks, Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS log root pool?
Hi, Is the ability to add a log device to a root pool on the roadmap for ZFS? Thanks, Gregg gregg dot ferguson at sun dot com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs fragmentation
On 08/07/09 10:54, Scott Meilicke wrote: ZFS absolutely observes synchronous write requests (e.g. by NFS or a database). The synchronous write requests do not benefit from the long write aggregation delay so the result may not be written as ideally as ordinary write requests. Recently zfs has added support for using a SSD as a synchronous write log, and this allows zfs to turn synchronous writes into more ordinary writes which can be written more intelligently while returning to the user with minimal latency. Bob, since the ZIL is used always, whether a separate device or not, won't writes to a system without a separate ZIL also be written as intelligently as with a separate ZIL? - Yes. ZFS uses the same code path (intelligence?) to write out the data from NFS - regardless of whether there's a separate log (slog) or not. Thanks, Scott ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs fragmentation
On Fri, 7 Aug 2009, Scott Meilicke wrote: Bob, since the ZIL is used always, whether a separate device or not, won't writes to a system without a separate ZIL also be written as intelligently as with a separate ZIL? I don't know the answer to that. Perhaps there is no current advantage. The longer the final writes can be deferred, the more opportunity there is to write the data with a better layout, or to avoid writing some data at all. One thing I forgot to mention in my summary is that zfs is commonly used in multi-user environments where there may be many simultaneous writers. Simultaneous writers tend to naturally fragment a filesystem unless the filesystem is willing to spread the data out in advance and take a seek hit (from one file to another) for each file write. Zfs deferrment of the writes allows the data to be written more intelligently in these multi-user environments. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] add-view for the zfs snapshot
I frist create lun by stmfadm create-lu , and add-view , so the initiator can see the created lun. Now I use zfs snapshot to create snapshot for the created lun. What can I do to make the snapshot is accessed by the Initiator? Thanks. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Pool iscsi /zfs performance in opensolaris 0906
Stephen Green wrote: Oh, and for those following along at home, the re-silvering of the slog to a file is proceeding well. 72% done in 25 minutes. And, for the purposes of the archives, the re-silver finished in 34 minutes and I successfully removed the RAM disk. Thanks, Erik for the eminently followable instructions. Also, I got my wife to agree to a new SSD, so I presume that I can simply do the re-silver with the new drive when it arrives. Can I replace a log with a larger one? Can I partition the SSD (looks like I'll be getting a 32GB one) and use half for cache and half for log? Even if I can, should I? Steve ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] add-view for the zfs snapshot
I frist create lun by stmfadm create-lu , and add-view , so the initiator can see the created lun. Now I use zfs snapshot to create snapshot for the created lun. hat can I do to make the snapshot is accessed by the Initiator? Thanks. Hi, This is a good question and something that I have not tried. Please see Chapter 7 of the zfs manual linked below. http://dlc.sun.com/pdf/819-5461/819-5461.pdf Cross posting with the zfs discuss regards Chuck -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Supported Motherboard SATA controller chipsets?
Hello Kyle! Sorry for the late answer. Be careful with nVidia if you want to use Samsung SATA disks. There is a problem with the disk freezing up. This bit me with our X2100M2 and X2200M2 systems. I don't know if it's related to your issue, but I have also seen comments around about the nv-sata windows drivers hanging up when formatting drives than 1024GB. But that's been fixed in the latest nvidia windows drivers. Does that sound related, or like something different? Something different. The problem with the X2100M2 and X2200M2 will only occur with specific Samsung disk models, in my case the HD103UJ 1 TB disk. The system will work fine, until suddenly the disk freezes up. The disk is then no longer recognized at all. It will not respond to any command whatsoever. After a power cycle, the disks is fine -- until the next freeze. I think the same happened to some people on the 'net with the 750GB variant of the same disk, but I have only seen it with the 1 TB type. Regards -- Volker -- Volker A. Brandt Consulting and Support for Sun Solaris Brandt Brandt Computer GmbH WWW: http://www.bb-c.de/ Am Wiesenpfad 6, 53340 Meckenheim Email: v...@bb-c.de Handelsregister: Amtsgericht Bonn, HRB 10513 Schuhgröße: 45 Geschäftsführer: Rainer J. H. Brandt und Volker A. Brandt ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Shrinking a zpool?
Hey Richard, I believe 6844090 would be a candidate for an s10 backport. The behavior of 6844090 worked nicely when I replaced a disk of the same physical size even though the disks were not identical. Another flexible storage feature is George's autoexpand property (Nevada build 117), where you can attach or replace a disk in a pool with LUN that is larger in size than the existing size of the pool, but you can keep the LUN size constrained with autoexpand set to off. Then, if you decide that you want to use the expanded LUN, you can set autoexpand to on, or you can just detach it to use in another pool where you need the expanded size. (The autoexpand feature description is in the ZFS Admin Guide on the opensolaris/...zfs/docs site.) Contrasting the autoexpand behavior to current Solaris 10 releases, I noticed recently that you can use zpool attach/detach to attach a larger disk for eventual replacement purposes and the pool size is expanded automatically, even on a live root pool, without the autoexpand feature and no import/export/reboot is needed. (Well, I always reboot to see if the new disk will boot before detaching the existing disk.) I did this recently to expand a 16-GB root pool to 68-GB root pool. See the example below. Cindy # zpool list NAMESIZE USED AVAILCAP HEALTH ALTROOT rpool 16.8G 5.61G 11.1G33% ONLINE - # zpool status pool: rpool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM rpoolONLINE 0 0 0 c1t18d0s0 ONLINE 0 0 0 errors: No known data errors # zpool attach rpool c1t18d0s0 c1t1d0s0 # zpool status rpool pool: rpool state: ONLINE status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scrub: resilver in progress for 0h3m, 51.35% done, 0h3m to go config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t18d0s0 ONLINE 0 0 0 c1t1d0s0 ONLINE 0 0 0 # installboot -F zfs /usr/platform/`uname -i`/lib/fs/zfs/bootblk /dev/rdsk/c1t1d0s0 boot from new disk to make sure replacement disk boots # init 0 # zpool list NAMESIZE USED AVAILCAP HEALTH ALTROOT rpool 16.8G 5.62G 11.1G33% ONLINE - # zpool status pool: rpool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t18d0s0 ONLINE 0 0 0 c1t1d0s0 ONLINE 0 0 0 errors: No known data errors # zpool detach rpool c1t18d0s0 # zpool list NAMESIZE USED AVAILCAP HEALTH ALTROOT rpool 68.2G 5.62G 62.6G 8% ONLINE - # cat /etc/release Solaris 10 5/09 s10s_u7wos_08 SPARC Copyright 2009 Sun Microsystems, Inc. All Rights Reserved. Use is subject to license terms. Assembled 30 March 2009 On 08/05/09 17:20, Richard Elling wrote: On Aug 5, 2009, at 4:06 PM, cindy.swearin...@sun.com wrote: Brian, CR 4852783 was updated again this week so you might add yourself or your customer to continue to be updated. In the meantime, a reminder is that a mirrored ZFS configuration is flexible in that devices can be detached (as long as the redundancy is not compromised) or replaced as long as the replacement disk is an equivalent size or larger. So, you can move storage around if you need to in a mirrored ZFS config and until 4852783 integrates. Thanks Cindy, This is another way to skin the cat. It works for simple volumes, too. But there are some restrictions, which could impact the operation when a large change in vdev size is needed. Is this planned to be backported to Solaris 10? CR 6844090 has more details. http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6844090 -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs fragmentation
Let me give a real life example of what I believe is a fragmented zfs pool. Currently the pool is 2 terabytes in size (55% used) and is made of 4 san luns (512gb each). The pool has never gotten close to being full. We increase the size of the pool by adding 2 512gb luns about once a year or so. The pool has been divided into 7 filesystems. The pool is used for imap email data. The email system (cyrus) has approximately 80,000 accounts all located within the pool, evenly distributed between the filesystems. Each account has a directory associated with it. This directory is the users inbox. Additional mail folders are subdirectories. Mail is stored as individual files. We receive mail at a rate of 0-20MB/Second, every minute of every hour of every day of every week, etc etc. Users recieve mail constantly over time. They read it and then either delete it or store it in a subdirectory/folder. I imagine that my mail (located in a single subdirectory structure) is spread over the entire pool because it has been received over time. I believe the data is highly fragmented (from a file and directory perspective). The result of this is that backup thoughput of a single filesystem in this pool is about 8GB/hour. We use EMC networker for backups. This is a problem. There are no utilities available to evaluate this type of fragmentation. There are no utilities to fix it. ZFS, from the mail system perspective works great. Writes and random reads operate well. Backup is a problem and not just because of small files, but small files scatterred over the entire pool. Adding another pool and copying all/some data over to it would only a short term solution. I believe zfs needs a feature that operates in the background and defrags the pool to optimize sequential reads of the file and directory structure. Ed -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Pool iscsi /zfs performance in opensolaris 0906
Stephen Green wrote: Also, I got my wife to agree to a new SSD, so I presume that I can simply do the re-silver with the new drive when it arrives. And the last thing for today, I ended up getting: http://www.newegg.com/Product/Product.aspx?Item=N82E16820609330 which is 16GB and should be sufficient to my needs. I'll let you know how it works out. Suggestions as to pre/post installation IO tests welcome. Steve ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs fragmentation
On Aug 7, 2009, at 2:29 PM, Ed Spencer wrote: Let me give a real life example of what I believe is a fragmented zfs pool. Currently the pool is 2 terabytes in size (55% used) and is made of 4 san luns (512gb each). The pool has never gotten close to being full. We increase the size of the pool by adding 2 512gb luns about once a year or so. The pool has been divided into 7 filesystems. The pool is used for imap email data. The email system (cyrus) has approximately 80,000 accounts all located within the pool, evenly distributed between the filesystems. Each account has a directory associated with it. This directory is the users inbox. Additional mail folders are subdirectories. Mail is stored as individual files. We receive mail at a rate of 0-20MB/Second, every minute of every hour of every day of every week, etc etc. Users recieve mail constantly over time. They read it and then either delete it or store it in a subdirectory/folder. I imagine that my mail (located in a single subdirectory structure) is spread over the entire pool because it has been received over time. I believe the data is highly fragmented (from a file and directory perspective). The result of this is that backup thoughput of a single filesystem in this pool is about 8GB/hour. We use EMC networker for backups. This is very unlikely to be a fragmentation problem. It is a scalability problem and there may be something you can do about it in the short term. However, though I usually like to tease, in this case I need to tease. I recently completed a white paper on this exact workload and how we designed it to scale. I hope to publish that paper RSN. When the paper hits the web, I'll restart a new thread on using ZFS for large-scale email systems. This is a problem. There are no utilities available to evaluate this type of fragmentation. There are no utilities to fix it. ZFS, from the mail system perspective works great. Writes and random reads operate well. Backup is a problem and not just because of small files, but small files scatterred over the entire pool. Adding another pool and copying all/some data over to it would only a short term solution. I'll have to disagree. I believe zfs needs a feature that operates in the background and defrags the pool to optimize sequential reads of the file and directory structure. This will not solve your problem, but there are other methods that can. -- richard Ed -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Pool Layout Advice Needed
Adam Sherman wrote: On 6-Aug-09, at 15:16 , Ian Collins wrote: This ended up being a costly mistake, the environment I ended up with didn't play well with Live Upgrade. So I suggest what ever you do, make sure you can create a new BE and boot into it before committing. I assume this was old-style LU and the new-style ZFS-based boot environments? No, the original BE was build 101, ZFS boot. An lucreate from that BE took a day (!) and the new BE wasn't bootable. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss