Re: [zfs-discuss] Mounting a snapshot of an iSCSI volume using Windows
Sure, but that will put me back into the original situation. -Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Mounting a snapshot of an iSCSI volume using Windows
That is likely it. I create the volume using 2009.06, then later upgraded to 124. I just now created a new zvol, connected it to my windows server, formatted, and added some data. Then I snapped the zvol, cloned the snap, and used 'pfexec sbdadm create-lu'. When presented to the windows server, it behaved as expected. I could see the data I created prior to the snapshot. Thank you very much Dave (and everyone else). Now, -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Mounting a snapshot of an iSCSI volume using Windows
I plan on filing a support request with Sun, and will try to post back with any results. Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SSD and ZFS
I don't think adding an SSD mirror to an existing pool will do much for performance. Some of your data will surely go to those SSDs, but I don't think the solaris will know they are SSDs and move blocks in and out according to usage patterns to give you an all around boost. They will just be used to store data, nothing more. Perhaps it will be more useful to add the SSDs as either an L2ARC or SLOG for the ZIL, but that will depend upon your work load. If you do NFS or iSCSI access, the putting the ZIL onto the SSD drive(s) will speed up writes. Added to the L2ARC will speed up reads. Here is the ZFS best practices guide, which should help with this decision: http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide Read that, then come back with more questions. Best, Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] raidz2 array FAULTED with only 1 drive down
You might have to force the import with -f. Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [osol-discuss] Moving Storage to opensolaris+zfs. What a
>To be clear, you can do what you want with the following items (besides >your server): >(1) OpenSolaris LiveCD >(1) 8GB USB Flash drive >As many tapes as you need to store your data pools on. >Make sure the USB drive has a saved stream from your rpool. It should >also have a downloaded copy of whichever main backup software you use. >That's it. You backup data using Amanda/Bacula/et al onto tape. You >backup your boot/root filesystem using 'zfs send' onto the USB key. Erik, great! I never thought of the USB key to store an rpool copy. I will give it a go on my test box. Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] backup zpool to tape
Greg, I am using NetBackup 6.5.3.1 (7.x is out) with fine results. Nice and fast. -Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Can we get some documentation on iSCSI sharing after comstar took over?
This is what I used: http://wikis.sun.com/display/OpenSolarisInfo200906/How+to+Configure+iSCSI+Target+Ports I distilled that to: disable the old, enable the new (comstar) * sudo svcadm disable iscsitgt * sudo svcadm enable stmf Then four steps (using my zfs/zpool info - substitute for yours): * sudo zfs create -s -V 5t data01/san/gallardo/g (the -s makes it thin, -V specifies a block volume) * sbdadm create-lu /dev/zvol/rdsk/data01/san/gallardo/g * sudo itadm create-target * sudo stmfadm add-view 600144F0E24785004A80910A0001 This should allow any initiator to connect to your volume, no security. Not quite a one liner. After you create the target once (step 3), you do not have to do that again for the next volume. So three lines. -Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Is this a sensible spec for an iSCSI storage box?
It is hard, as you note, to recommend a box without knowing the load. How many linux boxes are you talking about? I think having a lot of space for your L2ARC is a great idea. Will you mirror your SLOG, or load balance them? I ask because perhaps one will be enough, IO wise. My box has one SLOG (X25-E) and can support about 2600 IOPS using an iometer profile that closely approximates my work load. My ~100 VMs on 8 ESX boxes average around 1000 IOPS, but can peak 2-3x that during backups. Don't discount NFS. I absolutely love NFS for management and thin provisioning reasons. Much easier (to me) than managing iSCSI, and performance is similar. I highly recommend load testing both iSCSI and NFS before you go live. Crash consistent backups of your VMs are possible using NFS, and recovering a VM from a snapshot is a little easier using NFS, I find. Why not larger capacity disks? Hopefully your switches support NIC aggregation? The only issue I have had on 2009.06 using iSCSI (I had a windows VM directly attaching to an iSCSI 4T volume) was solved and back ported to 2009.06 (bug 6794994). -Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Is this a sensible spec for an iSCSI storage box?
>I was planning to mirror them - mainly in the hope that I could hot swap a new >one in the event that an existing one started to degrade. I suppose I could >start with one of each and convert to a mirror later although the prospect of >losing either disk fills me with dread. You do not need to mirror the L2ARC devices, as the system will just hit disk as necessary. Mirroring sounds like a good idea on the SLOG, but this has been much discussed on the forums. >> Why not larger capacity disks? >We will run out of iops before we run out of space. Interesting. I find IOPS is more proportional to the number of VMs vs disk space. User: I need a VM that will consume up to 80G in two years, so give me an 80G disk. Me: OK, but recall we can expand disks and filesystems on the fly, without downtime. User: Well, that is cool, but 80G to start with please. Me: I also believe the SLOG and L2ARC will make using high RPM disks not as necessary. But, from what I have read, higher RPM disks will greatly help with scrubs and reslivers. Maybe two pools - one with fast mirrored SAS, another with big SATA. Or all SATA, but one pool with mirrors, another with raidz2. Many options. But measure to see what works for you. iometer is great for that, I find. >Any opinions on the use of battery backed SAS adapters? Surely these will help with performance in write back mode, but I have not done any hard measurements. Anecdotally my PERC5i in a Dell 2950 seemed to greatly help with IOPS on a five disk raidz. There are pros and cons. Search the forums, but off the top of my head 1) SLOGs are much larger than controller caches: 2) only synced write activity is cached in a ZIL, whereas a controller cache will cache everything, needed or not, thus running out of space sooner; 3) SLOGS and L2ARC devices are specialized caches for read and write loads, vs. the all in one cache of a controller. 4) A controller *may* be faster, since it uses ram for the cache. One of the benefits of a SLOG on the SAS/SATA bus is for a cluster. If one node goes down, the other can bring up the pool, check the ZIL for any necessary transactions, and apply them. To do this with battery backed cache, you would need fancy interconnects between the nodes, cache mirroring, etc. All of those things that SAN array products do. Sounds like you have a fun project. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS/OSOL/Firewire...
>Apple users have different expectations regarding data loss than Solaris and >Linux users do. Come on, no Apple user bashing. Not true, not fair. Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Rethinking my zpool
You will get much better random IO with mirrors, and better reliability when a disk fails with raidz2. Six sets of mirrors are fine for a pool. From what I have read, a hot spare can be shared across pools. I think the correct term would be "load balanced mirrors", vs RAID 10. What kind of performance do you need? Maybe raidz2 will give you the performance you need. Maybe not. Measure the performance of each configuration and decide for yourself. I am a big fan of iometer for this type of work. -Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Is this a sensible spec for an iSCSI storage box?
> One of the reasons I am investigating solaris for > this is sparse volumes and dedupe could really help > here. Currently we use direct attached storage on > the dom0s and allocate an LVM to the domU on > creation. Just like your example above, we have lots > of those "80G to start with please" volumes with 10's > of GB unused. I also think this data set would > dedupe quite well since there are a great many > identical OS files across the domUs. Is that > assumption correct? This is one reason I like NFS - thin by default, and no wasted space within a zvol. zvols can be thin as well, but opensolaris will not know the inside format of the zvol, and you may still have a lot of wasted space after a while as files inside of the zvol come and go. In theory dedupe should work well, but I would be careful about a possible speed hit. > I've not seen an example of that before. Do you mean > having two 'head units' connected to an external JBOD > enclosure or a proper HA cluster type configuration > where the entire thing, disks and all, are > duplicated? I have not done any type of cluster work myself, but from what I have read on Sun's site, yes, you could connect the same jbod to two head units, active/passive, in an HA cluster, but no duplicate disks/jbod. When the active goes down, passive detects this and takes over the pool by doing an import. During the import, any outstanding transactions on the zil are replayed, whether they are on a slog or not. I believe this is how Sun does it on their open storage boxes (7000 series). Note - two jbods could be used, one for each head unit, making an active/active setup. Each jbod is active on one node, passive on the other. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS for ISCSI ntfs backing store.
I have used build 124 in this capacity, although I did zero tuning. I had about 4T of data on a single 5T iSCSI volume over gigabit. The windows server was a VM, and the opensolaris box is on a Dell 2950, 16G of RAM, x25e for the zil, no l2arc cache device. I used comstar. It was being used as a target for Doubletake, so it only saw write IO, with very little read. My load testing using iometer was very positive, and I would not have hesitated to use it as the primary node serving about 1000 users, maybe 200-300 active at a time. Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS for ISCSI ntfs backing store.
At the time we had it setup as 3 x 5 disk raidz, plus a hot spare. These 16 disks were in a SAS cabinet, and the the slog was on the server itself. We are now running 2 x 7 raidz2 plus a hot spare and slog, all inside the cabinet. Since the disks are 1.5T, I was concerned about resliver times for a failed disk. About the only thing I would consider at this point is getting an SSD for the l2arc for dedupe performance. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Benchmarking Methodologies
My use case for opensolaris is as a storage server for a VM environment (we also use EqualLogic, and soon an EMC CX4-120). To that end, I use iometer within a VM, simulating my VM IO activity, with some balance given to easy benchmarking. We have about 110 VMs across eight ESX hosts. Here is what I do: * Attach a 100G vmdk to one Windows 2003 R2 VM * Create a 32G test file (my opensolaris box has 16G of RAM) * export/import the pool on the solaris box, and reboot my guest to clear caches all around * Run a disk queue depth of 32 outstanding IOs * 60% read, 65% random, 8k block size * Run for five minutes spool up, then run the test for five minutes My actual workload is closer to 50% read, 16k block size, so I adjust my interpretation of the results accordingly. Probably I should run a lot more iometer daemons. Performance will increase as the benchmark runs due to the l2arc filling up, so I found that running the benchmark starting at 5 minutes into the work load was a happy medium. Things will get a bit faster the longer the benchmark runs, but this is good as far as benchmarking goes. Only occasionally due I get wacko results, which I happily toss out the window. Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] iSCSI confusion
VMware will properly handle sharing a single iSCSI volume across multiple ESX hosts. We have six ESX hosts sharing the same iSCSI volumes - no problems. -Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] iScsi slow
iSCSI writes require a sync to disk for every write. SMB writes get cached in memory, therefore are much faster. I am not sure why it is so slow for reads. Have you tried comstar iSCSI? I have read in these forums that it is faster. -Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] combining series of snapshots
You might bring over all of your old data and snaps, then clone that into a new volume. Bring your recent stuff into the clone. Since the clone only updates blocks that are different than the underlying snap, you may see a significant storage savings. Two clones could even be made - one for your live data, another to access the historical data. -Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] OCZ Devena line of enterprise SSD
Price? I cannot find it. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] COMSTAR iSCSI and two Windows computers
Look again at how XenServer does storage. I think you will find it already has a solution, both for iSCSI and NFS. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] raid-z - not even iops distribution
Reaching into the dusty regions of my brain, I seem to recall that since RAIDz does not work like a traditional RAID 5, particularly because of variably sized stripes, that the data may not hit all of the disks, but it will always be redundant. I apologize for not having a reference for this assertion, so I may be completely wrong. I assume your hardware is recent, the controllers are on PCIe x4 buses, etc. -Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Deleting large amounts of files
If these files are deduped, and there is not a lot of RAM on the machine, it can take a long, long time to work through the dedupe portion. I don't know enough to know if that is what you are experiencing, but it could be the problem. How much RAM do you have? Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] slog/L2ARC on a hard drive and not SSD?
Another data point - I used three 15K disks striped using my RAID controller as a slog for the zil, and performance went down. I had three raidz sata vdevs holding the data, and my load was VMs, i.e. a fair amount of small, random IO (60% random, 50% write, ~16k in size). Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] snapshot space - miscalculation?
Are there other file systems underneath daten/backups that have snapshots? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Pool iscsi /zfs performance in opensolaris 0906
This has been a very enlightening thread for me, and explains a lot of the performance data I have collected on both 2008.11 and 2009.06 which mirrors the experiences here. Thanks to you all. NFS perf tuning, here I come... -Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Can I setting 'zil_disable' to increase ZFS/iscsi performance ?
You can use a separate SSD ZIL. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Pool iscsi /zfs performance in opensolaris 0906
Note - this has a mini PCIe interface, not PCIe. I had the 64GB version in a Dell Mini 9. While it was great for it's small size, low power and low heat characteristics (no fan on the Mini 9!), it was only faster than the striped sata drives in my mac pro when it came to random reads. Everything else was slower, sometimes by a lot, as measured by XBench. Unfortunately I no longer have the numbers to share. I see the sustained writes listed as up to 25 MB/s, and bursts up to 51 MB/s. That said, I have read of people having good luck with fast CF cards (no ref, sorry). So maybe this will be just fine :) -Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs fragmentation
> ZFS absolutely observes synchronous write requests (e.g. by NFS or a > database). The synchronous write requests do not benefit from the > long write aggregation delay so the result may not be written as > ideally as ordinary write requests. Recently zfs has added support > for using a SSD as a synchronous write log, and this allows zfs to > turn synchronous writes into more ordinary writes which can be written > more intelligently while returning to the user with minimal latency. Bob, since the ZIL is used always, whether a separate device or not, won't writes to a system without a separate ZIL also be written as intelligently as with a separate ZIL? Thanks, Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] NFS load balancing / was: ZFS, ESX , and NFS. oh my!
Yes! That would be icing on the cake. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Live resize/grow of iscsi shared ZVOL
My EqualLogic arrays do not disconnect when resizing volumes. When I need to resize, on the Windows side I open the iSCSI control panel, and get ready to click the 'logon' button. I then resize the volume on the OpenSolaris box, and immediately after that is complete, on the Windows side, re-login to the target. Since the Windows initiator can tolerate brief disconnects, IO is not stopped or adversely affected, just paused for those few seconds. It works fine. Multi-path is a little more complicated as you would have to re-logon to all of your paths, but if you have at least one path active, you should be fine. -Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to find poor performing disks
You can try: zpool iostat pool_name -v 1 This will show you IO on each vdev at one second intervals. Perhaps you will see different IO behavior on any suspect drive. -Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Connect couple of SATA JBODs to one storage server
Roman, are you saying you want to install OpenSolaris on your old servers, or make the servers look like an external JBOD array, that another server will then connect to? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs performance cliff when over 80% util, still occuring when pool in 6
As I understand it, when you expand a pool, the data do not automatically migrate to the other disks. You will have to rewrite the data somehow, usually a backup/restore. -Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS iSCSI Clustered for VMware Host use
You are completely off your rocker :) No, just kidding. Assuming the virtual front-end servers are running on different hosts, and you are doing some sort of raid, you should be fine. Performance may be poor due to the inexpensive targets on the back end, but you probably know that. A while back I thought of doing similar stuff using local storage on my ESX hosts, and abstracting that with an OpenSolaris VM and iSCSI/NFS. Perhaps consider inexpensive but decent NAS/SAN devices from Synology. They are not expensive, offer NFS and iSCSI, and you can also replicate/backup between two of them using rsync. Yes, you would be 'wasting' the storage space by having two, but like I said, they are inexpensive. Then you would not have the two layer architecture. I just tested a two disk model, using ESXi 3.5u4 and a Windows VM. I used iometer, realworld test, and IOs were about what you would expect from mirrored 7200 SATA drives - 138 IOPS, about 1.1 Mbps. The internal CPU was around 20%, RAM usage was 128MB out of the 512MB on board, so it was disk limited. The Dell 2950 that I have 2009.06 installed on (16GB of RAM and an LSI HBA with an external SAS enclosure) with a single mirror using two 7200 drives gave me about 200 IOPS using the same test, presumably because of the large amounts of RAM for the L2ARC cache. -Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Pulsing write performance
Roch Bourbonnais Wrote: ""100% random writes produce around 200 IOPS with a 4-6 second pause around every 10 seconds. " This indicates that the bandwidth you're able to transfer through the protocol is about 50% greater than the bandwidth the pool can offer to ZFS. Since, this is is not sustainable, you see here ZFS trying to balance the 2 numbers." When I have tested using 50% reads, 60% random using iometer over NFS, I can see the data going straight to disk due to the sync nature of NFS. But I also see writes coming to a stand still every 10 seconds or so, which I have attributed to the ZIL dumping to disk. Therefore I conclude that it is the process of dumping the ZIL to disk that (mostly?) blocks writes during the dumping. I do agree with Bob and others that suggest making the size of the dump smaller will mask this behavior, and that seems like a good idea, although I have not yet tried and tested it myself. -Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Pulsing write performance
I am still not buying it :) I need to research this to satisfy myself. I can understand that the writes come from memory to disk during a txg write for async, and that is the behavior I see in testing. But for sync, data must be committed, and a SSD/ZIL makes that faster because you are writing to the SSD/ZIL, and not to spinning disk. Eventually that data on the SSD must get to spinning disk. To the books I go! -Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Understanding when (and how) ZFS will use spare disks
This sounds like the same behavior as opensolaris 2009.06. I had several disks recently go UNAVAIL, and the spares did not take over. But as soon as I physically removed a disk, the spare started replacing the removed disk. It seems UNAVAIL is not the same as the disk not being there. I wish the spare *would* take over in these cases, since the pool is degraded. -Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Pulsing write performance
So what happens during the txg commit? For example, if the ZIL is a separate device, SSD for this example, does it not work like: 1. A sync operation commits the data to the SSD 2. A txg commit happens, and the data from the SSD are written to the spinning disk So this is two writes, correct? -Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Pulsing write performance
Doh! I knew that, but then forgot... So, for the case of no separate device for the ZIL, the ZIL lives on the disk pool. In which case, the data are written to the pool twice during a sync: 1. To the ZIL (on disk) 2. From RAM to disk during tgx If this is correct (and my history in this thread is not so good, so...), would that then explain some sort of pulsing write behavior for sync write operations? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Pulsing write performance
So, I just re-read the thread, and you can forget my last post. I had thought the argument was that the data were not being written to disk twice (assuming no separate device for the ZIL), but it was just explaining to me that the data are not read from the ZIL to disk, but rather from memory to disk. I need more coffee... -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Pulsing write performance
Yes, I was getting confused. Thanks to you (and everyone else) for clarifying. Sync or async, I see the txg flushing to disk starve read IO. Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Pulsing write performance
I only see the blocking while load testing, not during regular usage, so I am not so worried. I will try the kernel settings to see if that helps if/when I see the issue in production. For what it is worth, here is the pattern I see when load testing NFS (iometer, 60% random, 65% read, 8k chunks, 32 outstanding I/Os): data01 59.6G 20.4T 46 24 757K 3.09M data01 59.6G 20.4T 39 24 593K 3.09M data01 59.6G 20.4T 45 25 687K 3.22M data01 59.6G 20.4T 45 23 683K 2.97M data01 59.6G 20.4T 33 23 492K 2.97M data01 59.6G 20.4T 16 41 214K 1.71M data01 59.6G 20.4T 3 2.36K 53.4K 30.4M data01 59.6G 20.4T 1 2.23K 20.3K 29.2M data01 59.6G 20.4T 0 2.24K 30.2K 28.9M data01 59.6G 20.4T 0 1.93K 30.2K 25.1M data01 59.6G 20.4T 0 2.22K 0 28.4M data01 59.7G 20.4T 21295 317K 4.48M data01 59.7G 20.4T 32 12 495K 1.61M data01 59.7G 20.4T 35 25 515K 3.22M data01 59.7G 20.4T 36 11 522K 1.49M data01 59.7G 20.4T 33 24 508K 3.09M LSI SAS HBA, 3 x 5 disk raidz, Dell 2950, 16GB RAM. -Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Pulsing write performance
True, this setup is not designed for high random I/O, but rather lots of storage with fair performance. This box is for our dev/test backend storage. Our production VI runs in the 500-700 IOPS (80+ VMs, production plus dev/test) on average, so for our development VI, we are expecting half of that at most, on average. Testing with parameters that match the observed behavior of the production VI gets us about 750 IOPS with compression (NFS, 2009.06), so I am happy with the performance and very happy with the amount of available space. Stripped mirrors are much faster, ~2200 IOPS with 16 disks (but alas, tested with iSCSI on 2008.11, compression on. We got about 1,000 IOPS with the 3x5 raidz setup with compression to compare iSCSI and 2008.11 vs NFS and 2009.06), but again we are shooting for available space, with performance being a secondary goal. And yes, we would likely get much better performance using SSDs for the ZIL and L2ARC. This has been an interesting thread! Sorry for the bit of hijacking... -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] RAIDZ versus mirrroed
I think in theory the ZIL/L2ARC should make things nice and fast if your workload includes sync requests (database, iscsi, nfs, etc.), regardless of the backend disks. But the only sure way to know is test with your work load. -Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] How to verify if the ZIL is disabled
How can I verify if the ZIL has been disabled or not? I am trying to see how much benefit I might get by using an SSD as a ZIL. I disabled the ZIL via the ZFS Evil Tuning Guide: echo zil_disable/W0t1 | mdb -kw and then rebooted. However, I do not see any benefits for my NFS workload. Thanks, Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to verify if the ZIL is disabled
Thank you both, much appreciated. I ended up having to put the flag into /etc/system. When I disabled the ZIL and umount/mounted without a reboot, my ESX host would not see the NFS export, nor could I create a new NFS connection from my ESX host. I could get into the file system from the host itself of course. -Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to verify if the ZIL is disabled
> zfs share -a Ah-ha! Thanks. FYI, I got between 2.5x and 10x improvement in performance, depending on the test. So tempting :) -Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Incremental snapshot size
It is more cost, but a WAN Accelerator (Cisco WAAS, Riverbed, etc.) would be a big help. Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] poor man's Drobo on FreeNAS
Requires a login... -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Setting up an SSD ZIL - Need A Reality Check
I have an Intel X25-E 32G in the mail (actually the kingston version), and wanted to get a sanity check before I start. System: Dell 2950 16G RAM 16 1.5T SATA disks in a SAS chassis hanging off of an LSI 3801e, no extra drive slots, a single zpool. svn_124, but with my zpool still running at the 2009.06 version (14). I will likely get another chassis and 16 disks for another pool in the 3-18 month time frame. My plan is to put the SSD into an open disk slot on the 2950, but will have to configure it as a RAID 0, since the onboard PERC5 controller does not have a JBOD mode. Options I am considering: A. Use all 32G for the ZIL B. Use 8G for the ZIL, 24G for an L2ARC. Any issues with slicing up an SSD like this? C. Use 8G for the ZIL, 16G for an L2ARC, and reserve 8G to be used as a ZIL for the future zpool. Since my future zpool would just be used as a backup to disk target, I am leaning towards option C. Any gotchas I should be aware of? Thanks, Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Setting up an SSD ZIL - Need A Reality Check
Thanks Frédéric, that is a very interesting read. So my options as I see them now: 1. Keep the x25-e, and disable the cache. Performance should still be improved, but not by a *whole* like, right? I will google for an expectation, but if anyone knows off the top of their head, I would be appreciative. 2. Buy a ZEUS or similar SSD with a cap backed cache. Pricing is a little hard to come by, based on my quick google, but I am seeing $2-3k for an 8G model. Is that right? Yowch. 3. Wait for the x25-e g2, which is rumored to have cap backed cache, and may or may not work well (but probably will). 4. Put the x25-e with disabled cache behind my PERC with the PERC cache enabled. My budget is tight. I want better performance now. #4 sounds good. Thoughts? Regarding mirrored SSDs for the ZIL, it was my understanding that if the SSD backed ZIL failed, ZFS would fail back to using the regular pool for the ZIL, correct? Assuming this is correct, a mirror would be to preserve performance during a failure? Thanks everyone, this has been really helpful. -Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Setting up an SSD ZIL - Need A Reality Check
Ed, your comment: >If solaris is able to install at all, I would have to acknowledge, I >have to shutdown anytime I need to change the Perc configuration, including >replacing failed disks. Replacing failed disks is easy when PERC is doing the RAID. Just remove the failed drive and replace with a good one, and the PERC will rebuild automatically. But are you talking about OpenSolaris managed RAID? I am pretty sure, but not tested, that in pseudo JBOD mode (each disk a raid 0 or 1), the PERC would still present a replaced disk to the OS without reconfiguring the PERC BIOS. Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] File level cloning
I don't think so. But, you can clone at the ZFS level, and then just use the vmdk(s) that you need. As long as you don't muck about with the other stuff in the clone, the space usage should be the same. -Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool getting in a stuck state?
Hi Jeremy, I had a loosely similar problem with my 2009.06 box. In my case (which may not be yours), working with support we found a bug that was causing my pool to hang. I also got erroneous errors when I did a scrub ( 3 x 5 disk raidz). I am using the same LSI controller. A sure fire way to kill the box was to setup a file system as an iSCSI target, and write a lot of data to it, around 1-2MB/s. It would usually die inside of a few hours. NFS writing was not as bad, but within a day it would panic there too. The solution for me was to upgrade to 124. Since the upgrade three weeks ago, I have had no problems. Again, I don't know if this would fix your problem, but it may be worth a try. Just don't upgrade your ZFS version, and you will be able to roll back to 2009.06 at any time. -Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Difficulty testing an SSD as a ZIL
Hi all, I received my SSD, and wanted to test it out using fake zpools with files as backing stores before attaching it to my production pool. However, when I exported the test pool and imported, I get an error. Here is what I did: I created a file to use as a backing store for my new pool: mkfile 1g /data01/test2/1gtest Created a new pool: zpool create ziltest2 /data01/test2/1gtest Added the SSD as a log: zpool add -f ziltest2 log c7t1d0 (c7t1d0 is my SSD. I used the -f option since I had done this before with a pool called 'ziltest', same results) A 'zpool status' returned no errors. Exported: zpool export ziltest2 Imported: zpool import -d /data01/test2 ziltest2 cannot import 'ziltest2': one or more devices is currently unavailable This happened twice with two different test pools using file-based backing stores. I am nervous about adding the SSD to my production pool. Any ideas why I am getting the import error? Thanks, Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Difficulty testing an SSD as a ZIL
Excellent! That worked just fine. Thank you Victor. -Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS ZIL/log on SSD weirdness
I am sorry that I don't have any links, but here is what I observe on my system. dd does not do sync writes, so the ZIL is not used. iSCSI traffic does sync writes (as of 2009.06, but not 2008.05), so if you repeat your test using an iSCSI target from your system, you should see log activity. Same for NFS. I see no ZIL activity using rsync, for an example of a network file transfer that does not require sync. Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS ZIL/log on SSD weirdness
I second the use of zilstat - very useful, especially if you don't want to mess around with adding a log device and then having to destroy the pool if you don't want the log device any longer. On Nov 18, 2009, at 2:20 AM, Dushyanth wrote: > Just to clarify : Does iSCSI traffic from a Solaris iSCSI initiator > to a third party target go through ZIL ? It depends on whether the application requires a sync or not. dd does not, but databases (in general) do. As Richard said, ZFS treats the iSCSI volume just like any other vdev (pool of disks), so the fact that it is an iSCSI volume has nothing to do with ZFS' zil usage. -Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] mirroring ZIL device
# 1. It may help to use 15k disks as the zil. When I tested using three 15k disks striped as my zil, it made my workload go slower, even though it seems like it should have been faster. My suggestion is to test it out, and see if it helps. #3. You may get good performance with an inexpensive SSD because the SSD should have fast random writes, but probably not fast sequential writes. But I would test it first against your anticipated workload. :) An Intel 32G X25-E runs just shy of $400, and they are pretty speedy. I don't know if that would fit your budget. There is also some concern about losing power and having the X25 RAM cache disappear during a write. -Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] X45xx storage vs 7xxx Unified storage
If the 7310s can meet your performance expectations, they sound much better than a pair of x4540s. Auto-fail over, SSD performance (although these can be added to the 4540s), ease of management, and a great front end. I haven't seen if you can use your backup software with the 7310s, but from what I have read in this thread, that may be the only downside (a big one). Everything else points to the 7310s. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Using iSCSI on ZFS with non-native FS - How to backup.
It does 'just work', however you may have some file and/or file system corruption if the snapshot was taken at the moment that your mac is updating some files. So use the time slider function and take a lot of snaps. :) -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] raidz data loss stories?
Yes, a coworker lost a second disk during a rebuild of a raid5 and lost all data. I have not had a failure, however when migrating EqualLogic arrays in and out of pools, I lost a disk on an array. No data loss, but it concerns me because during the moves, you are essentially reading and writing all of the data on the disk. Did I have a latent problem on that particular disk that only exposed itself when doing such a large read/write? What if another disk had failed, and during the rebuild this latent problem was exposed? Trouble, trouble. They say security is an onion. So is data protection. Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZIL to disk
I think Y is such a variable and complex number it would be difficult to give a rule of thumb, other than to 'test with your workload'. My server, having three, five disk raidzs (striped) and an intel x25-e as a zil can fill my two G ethernet pipes over NFS (~200MBps) during mostly sequential writes. That same server can only consume about 22 MBps using an artificial load designed to simulate my VM activity (using iometer). So it varies greatly depending upon Y. -Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS configuration suggestion with 24 drives
It looks like there is not a free slot for a hot spare? If that is the case, then it is one more factor to push towards raidz2, as you will need time to remove the failed disk and insert a new one. During that time you don't want to be left unprotected. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS configuration suggestion with 24 drives
Link aggregation can use different algorithms to load balance. Using L4 (IP plus originating port I think), using a single client computer and the same protocol (NFS), but different origination ports has allowed me to saturate both NICS in my LAG. So yes, you just need more than one 'conversation', but the LAG setup will determine how a conversation is defined. Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Mounting a snapshot of an iSCSI volume using Windows
I have a single zfs volume, shared out using COMSTAR and connected to a Windows VM. I am taking snapshots of the volume regularly. I now want to mount a previous snapshot, but when I go through the process, Windows sees the new volume, but thinks it is blank and wants to initialize it. Any ideas how to get Windows to see that it has data on it? Steps I took after the snap: zfs clone data01/san/gallardo/g-recovery sbdadm create-lu /dev/zvol/rdsk/data01/san/gallardo/g-recovery stmfadm add-view -h HG-Gallardo -t TG-Gallardo -n 1 600144F0EAE40A004B6B59090003 At this point, my server Gallardo can see the LUN, but like I said, it looks blank to the OS. I suspect the 'sbdadm create-lu' phase. Any help to get Windows to see it as a LUN with NTFS data would be appreciated. Thanks, Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Mounting a snapshot of an iSCSI volume using Windows
Thanks Dan. When I try the clone then import: pfexec zfs clone data01/san/gallardo/g...@zfs-auto-snap:monthly-2009-12-01-00:00 data01/san/gallardo/g-testandlab pfexec sbdadm import-lu /dev/zvol/rdsk/data01/san/gallardo/g-testandlab The sbdadm import-lu gives me: sbdadm: guid in use which makes sense, now that I see it. The man pages make it look like I cannot give it another GUID during the import. Any other thoughts? I *could* delete the current lu, import, get my data off and reverse the process, but that would take the current volume off line, which is not what I want to do. Thanks, Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Asymmetric mirroring
The SATA drive will be your bottleneck, and you will lose any speed advantages of the SAS drives, especially using 3 vdevs on a single SATA disk. I am with Richard, figure out what performance you need, and build accordingly. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS, ESX ,and NFS. oh my!
My testing with 2008.11 iSCSI vs NFS was that iSCSI was about 2x faster. I used a 3 stripe 5 disk raidz (15 1.5TB sata disks). I just used the default zil, no SSD or similar to make NFS faster. I think (don't quote me) that ESX can only mount 64 iSCSI targets, so you aren't much better off. But, COMSTAR (2009.06) exports a single iSCSI target with multiple LUNs, so that gets around the limitation. I could be all wet on this one, however, so look into it before taking my word. Obviously iSCSI and NFS are quite different at the storage level, and I actually like NFS for the flexibility over iSCSI (quotas, reservations, etc.) -Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] 7110 questions
Both iSCSI and NFS are slow? I would expect NFS to be slow, but in my iSCSI testing with OpenSolaris 2008.11, performance we reasonable, about 2x NFS. Setup: Dell 2950 with a SAS HBA and SATA 3x5 raidz (15 disks, no separate ZIL), iSCSI using vmware ESXi 3.5 software initiator. Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS, ESX ,and NFS. oh my!
So how are folks getting around the NFS speed hit? Using SSD or battery backed RAM ZILs? Regarding limited NFS mounts, underneath a single NFS mount, would it work to: * Create a new VM * Remove the VM from inventory * Create a new ZFS file system underneath the original * Copy the VM to that file system * Add to inventory At this point the VM is running underneath it's own file system. I don't know if ESX would see this? To create another VM: * Snap the original VM * Create a clone underneath the original NFS FS, along side the original VM ZFS. Laborious to be sure. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Is the PROPERTY compression will increase the ZFS I/O th
Generally, yes. Test it with your workload and see how it works out for you. -Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SAN server
Oh boy, there are a lot of things here :) How many people in your office will be using these services? If it are just 50 people or so, you would probably be fine with just about any configuration. 500 or 5000 would be a different story, and you would have to be much more careful. If possible, you should measure what your requirements are on existing load. For example, figure out what kind of I/O demands are being generated by each application. It would also be nice to understand the I/O patters, i.e. IO size, read/write ratio, randomness. Then you can use a tool like iometer to simulate your load on the ZFS box from a VM on the ESX box, and figure out what raid sets are required to support your load. Another rough rule of thumb you can use is to sum the read and write IOPS at the 95 percentile (take a good guess), and use this equation to determine how many IOs you will need to support: RAID 10 1 * measured read IOPS + 2 * measured write IOPS RAID 5 1 * read IOPS + 4 * write IOPS Each SATA disk can support in the 100 IOPS range (considering a lot of the IOs are random), and a 15k SAS disk around 200 IOPS. So if your total app load was 500 read and 500 write IOPS, then: raid 10: 500 + 2*500 = 1500 ~10-15 SATA disks ~ 6-8 15k SAS disks raid 5: 500 + 500*4 = 2500 ~20-25 SATA disks ~12-14 SAS disks These are *really* rough numbers, and conservative. Honestly you could spend ages on an IO study like this. There is a ZFS best practices guide out there which makes good reading, and talks to the pros/cons of the different raid types offered by ZFS, how many disks in a single raidz, etc. I think most people using ZFS go with the software raid sets to take advantage of checksums and self healing. Performance on modern hardware is fine. Regarding cards, if you are not going to do HW raid, just get a SAS HBA. It makes life so much simplier. LSI makes a good selection of them. No RAID functionality, just good, fast IO. Attach that to a SAS JBOD, and you can mix and match sata and sas drives to fid your application. If you want to go HW raid, try to get a card that supports JBOD mode so you can use software raid if you change your mind. -Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SAN server
For ~100 people, I like Bob's answer. RAID 10 will get you lots of speed. Perhaps RAID50 would be just fine for you as well and give your more space, but without measuring, you won't be sure. Don't forget a hot spare (or two)! Your MySQL database - will that generate a lot of IO? Also, to ensure you can recover from failures, consider separate pools for your database files and log files, both for MySQL and Exchange. Good luck! -Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS for iSCSI based SAN
See this thread for information on load testing for vmware: http://communities.vmware.com/thread/73745?tstart=0&start=0 Within the thread there are instructions for using iometer to load test your storage. You should test out your solution before going live, and compare what you get with what you need. Just because striping 3 mirrors *will* give you more performance than raidz2 doesn't always mean that is the best solution. Choose the best solution for your use case. You should have at least two NICs per connection to storage and LAN (4 total in this simple example), for redundancy if nothing else. Performance wise, vsphere can now have multiple SW iSCSI connections to a single LUN. My testing showed compression increased iSCSI performance by 1.7x, so I like compression. But again, these are my tests in my situation. Your results may differ from mine. Regarding ZIL usage, from what I have read you will only see benefits if you are using NFS backed storage, but that it can be significant. Remove the ZIL for testing to see the max benefit you could get. Don't do this in production! -Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS for iSCSI based SAN
> if those servers are on physical boxes right now i'd do some perfmon > caps and add up the iops. Using perfmon to get a sense of what is required is a good idea. Use the 95 percentile to be conservative. The counters I have used are in the Physical disk object. Don't ignore the latency counters either. In my book, anything consistently over 20ms or so is excessive. I run 30+ VMs on an Equallogic array with 14 sata disks, broken up as two striped 6 disk raid5 sets (raid 50) with 2 hot spares. That array is, on average, about 25% loaded from an IO stand point. Obviously my VMs are pretty light. And the EQL gear is *fast*, which makes me feel better about spending all of that money :). >> Regarding ZIL usage, from what I have read you will only see >> benefits if you are using NFS backed storage, but that it can be >> significant. > > link? >From the ZFS Evil Tuning Guide >(http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide): "ZIL stands for ZFS Intent Log. It is used during synchronous writes operations." further down: "If you've noticed terrible NFS or database performance on SAN storage array, the problem is not with ZFS, but with the way the disk drivers interact with the storage devices. ZFS is designed to work with storage devices that manage a disk-level cache. ZFS commonly asks the storage device to ensure that data is safely placed on stable storage by requesting a cache flush. For JBOD storage, this works as designed and without problems. For many NVRAM-based storage arrays, a problem might come up if the array takes the cache flush request and actually does something rather than ignoring it. Some storage will flush their caches despite the fact that the NVRAM protection makes those caches as good as stable storage. ZFS issues infrequent flushes (every 5 second or so) after the uberblock updates. The problem here is fairly inconsequential. No tuning is warranted here. ZFS also issues a flush every time an application requests a synchronous write (O_DSYNC, fsync, NFS commit, and so on). The completion of this type of flush is waited upon by the application and impacts performance. Greatly so, in fact. From a performance standpoint, this neutralizes the benefits of having an NVRAM-based storage." When I was testing iSCSI vs. NFS, it was clear iSCSI was not doing sync, NFS was. Here are some zpool iostat numbers: iSCSI testing using iometer with the RealLife work load (65% read, 60% random, 8k transfers - see the link in my previous post) - it is clear that writes are being cached in RAM, and then spun off to disk. # zpool iostat data01 1 capacity operationsbandwidth pool used avail read write read write -- - - - - - - data01 55.5G 20.4T691 0 4.21M 0 data01 55.5G 20.4T632 0 3.80M 0 data01 55.5G 20.4T657 0 3.93M 0 data01 55.5G 20.4T669 0 4.12M 0 data01 55.5G 20.4T689 0 4.09M 0 data01 55.5G 20.4T488 1.77K 2.94M 9.56M data01 55.5G 20.4T 29 4.28K 176K 23.5M data01 55.5G 20.4T 25 4.26K 165K 23.7M data01 55.5G 20.4T 20 3.97K 133K 22.0M data01 55.6G 20.4T170 2.26K 1.01M 11.8M data01 55.6G 20.4T678 0 4.05M 0 data01 55.6G 20.4T625 0 3.74M 0 data01 55.6G 20.4T685 0 4.17M 0 data01 55.6G 20.4T690 0 4.04M 0 data01 55.6G 20.4T679 0 4.02M 0 data01 55.6G 20.4T664 0 4.03M 0 data01 55.6G 20.4T699 0 4.27M 0 data01 55.6G 20.4T423 1.73K 2.66M 9.32M data01 55.6G 20.4T 26 3.97K 151K 21.8M data01 55.6G 20.4T 34 4.23K 223K 23.2M data01 55.6G 20.4T 13 4.37K 87.1K 23.9M data01 55.6G 20.4T 21 3.33K 136K 18.6M data01 55.6G 20.4T468496 2.89M 1.82M data01 55.6G 20.4T687 0 4.13M 0 Testing against NFS shows writes to disk continuously. NFS Testing capacity operationsbandwidth pool used avail read write read write -- - - - - - - data01 59.6G 20.4T 57216 352K 1.74M data01 59.6G 20.4T 41 21 660K 2.74M data01 59.6G 20.4T 44 24 655K 3.09M data01 59.6G 20.4T 41 23 598K 2.97M data01 59.6G 20.4T 34 33 552K 4.21M data01 59.6G 20.4T 46 24 757K 3.09M data01 59.6G 20.4T 39 24 593K 3.09M data01 59.6G 20.4T 45 25 687K 3.22M data01 59.6G 20.4T 45 23 683K 2.97M data01 59.6G 20.4T 33 23 492K 2.97M data01 59.6G 20.4T 16 41 214K 1.71M data01 59.6G 20.4T 3 2.36K 53.4K 30.4M data01 59.6G 20.4T 1 2.23K 20.3K 29.2
Re: [zfs-discuss] ZFS for iSCSI based SAN
> Isn't that section of the evil tuning guide you're quoting actually about > checking if the NVRAM/driver connection is working right or not? Miles, yes, you are correct. I just thought it was interesting reading about how syncs and such work within ZFS. Regarding my NFS test, you remind me that my test was flawed, in that my iSCSI numbers were using the ESXi iSCSI SW initiator, while the NFS tests were performed with the VM as the guest, not ESX. I'll give ESX as the NFS client, vmdks on NFS, a go and get back to you. Thanks! Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS for iSCSI based SAN
I ran the RealLife iometer profile on NFS based storage (vs. SW iSCSI), and got nearly identical results to having the disks on iSCSI: iSCSI IOPS: 1003.8 MB/s: 7.8 Avg Latency (s): 27.9 NFS IOPS: 1005.9 MB/s: 7.9 Avg Latency (s): 29.7 Interesting! Here is how the pool was behaving during the testing. Again this is NFS backed storage: data01 122G 20.3T166 63 2.80M 4.49M data01 122G 20.3T145 59 2.28M 3.35M data01 122G 20.3T168 58 2.89M 4.38M data01 122G 20.3T169 59 2.79M 3.69M data01 122G 20.3T 54935 856K 18.1M data01 122G 20.3T 9 7.96K 183K 134M data01 122G 20.3T 49 3.82K 900K 61.8M data01 122G 20.3T160 61 2.73M 4.23M data01 122G 20.3T166 63 2.62M 4.01M data01 122G 20.3T162 64 2.55M 4.24M data01 122G 20.3T163 61 2.63M 4.14M data01 122G 20.3T145 54 2.37M 3.89M data01 122G 20.3T163 63 2.69M 4.35M data01 122G 20.3T171 64 2.80M 3.97M data01 122G 20.3T153 67 2.68M 4.65M data01 122G 20.3T164 66 2.63M 4.10M data01 122G 20.3T171 66 2.75M 4.51M data01 122G 20.3T175 53 3.02M 3.83M data01 122G 20.3T157 59 2.64M 3.80M data01 122G 20.3T172 59 2.85M 4.11M data01 122G 20.3T173 68 2.99M 4.11M data01 122G 20.3T 97 35 1.66M 2.61M data01 122G 20.3T170 58 2.87M 3.62M data01 122G 20.3T160 64 2.72M 4.17M data01 122G 20.3T163 63 2.68M 3.77M data01 122G 20.3T160 60 2.67M 4.29M data01 122G 20.3T165 65 2.66M 4.05M data01 122G 20.3T191 59 3.25M 3.97M data01 122G 20.3T159 65 2.76M 4.18M data01 122G 20.3T154 52 2.64M 3.50M data01 122G 20.3T164 61 2.76M 4.38M data01 122G 20.3T154 62 2.66M 4.08M data01 122G 20.3T160 58 2.71M 3.95M data01 122G 20.3T 84 34 1.48M 2.37M data01 122G 20.3T 9 7.27K 156K 125M data01 122G 20.3T 25 5.20K 422K 84.3M data01 122G 20.3T170 60 2.77M 3.64M data01 122G 20.3T170 63 2.85M 3.85M So it appears NFS is doing syncs, while iSCSI is not (See my earlier zpool iostat data for iSCSI). Isn't this what we expect, because NFS does syncs, while iSCSI does not (assumed)? -Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] slow ls or slow zfs
Hi, When you have a lot of random read/writes, raidz/raidz2 can be fairly slow. http://blogs.sun.com/roch/entry/when_to_and_not_to The recommendation is to break the disks into smaller raidz/z2 stripes, thereby improving IO. >From the ZFS Best Practices Guide: http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide#RAID-Z_Configuration_Requirements_and_Recommendations "The recommended number of disks per group is between 3 and 9. If you have more disks, use multiple groups." -Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS write I/O stalls
For what it is worth, I too have seen this behavior when load testing our zfs box. I used iometer and the RealLife profile (1 worker, 1 target, 65% reads, 60% random, 8k, 32 IOs in the queue). When writes are being dumped, reads drop close to zero, from 600-700 read IOPS to 15-30 read IOPS. zpool iostat data01 1 Where data01 is my pool name pool used avail read write read write -- - - - - - - data01 55.5G 20.4T691 0 4.21M 0 data01 55.5G 20.4T632 0 3.80M 0 data01 55.5G 20.4T657 0 3.93M 0 data01 55.5G 20.4T669 0 4.12M 0 data01 55.5G 20.4T689 0 4.09M 0 data01 55.5G 20.4T488 1.77K 2.94M 9.56M data01 55.5G 20.4T 29 4.28K 176K 23.5M data01 55.5G 20.4T 25 4.26K 165K 23.7M data01 55.5G 20.4T 20 3.97K 133K 22.0M data01 55.6G 20.4T170 2.26K 1.01M 11.8M data01 55.6G 20.4T678 0 4.05M 0 data01 55.6G 20.4T625 0 3.74M 0 data01 55.6G 20.4T685 0 4.17M 0 data01 55.6G 20.4T690 0 4.04M 0 data01 55.6G 20.4T679 0 4.02M 0 data01 55.6G 20.4T664 0 4.03M 0 data01 55.6G 20.4T699 0 4.27M 0 data01 55.6G 20.4T423 1.73K 2.66M 9.32M data01 55.6G 20.4T 26 3.97K 151K 21.8M data01 55.6G 20.4T 34 4.23K 223K 23.2M data01 55.6G 20.4T 13 4.37K 87.1K 23.9M data01 55.6G 20.4T 21 3.33K 136K 18.6M data01 55.6G 20.4T468496 2.89M 1.82M data01 55.6G 20.4T687 0 4.13M 0 -Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS write I/O stalls
> On Tue, 30 Jun 2009, Bob Friesenhahn wrote: > > Note that this issue does not apply at all to NFS > service, database > service, or any other usage which does synchronous > writes. I see read starvation with NFS. I was using iometer on a Windows VM, connecting to an NFS mount on a 2008.11 physical box. iometer params: 65% read, 60% random, 8k blocks, 32 outstanding IO requests, 1 worker, 1 target. NFS Testing capacity operationsbandwidth pool used avail read write read write -- - - - - - - data01 59.6G 20.4T 46 24 757K 3.09M data01 59.6G 20.4T 39 24 593K 3.09M data01 59.6G 20.4T 45 25 687K 3.22M data01 59.6G 20.4T 45 23 683K 2.97M data01 59.6G 20.4T 33 23 492K 2.97M data01 59.6G 20.4T 16 41 214K 1.71M data01 59.6G 20.4T 3 2.36K 53.4K 30.4M data01 59.6G 20.4T 1 2.23K 20.3K 29.2M data01 59.6G 20.4T 0 2.24K 30.2K 28.9M data01 59.6G 20.4T 0 1.93K 30.2K 25.1M data01 59.6G 20.4T 0 2.22K 0 28.4M data01 59.7G 20.4T 21295 317K 4.48M data01 59.7G 20.4T 32 12 495K 1.61M data01 59.7G 20.4T 35 25 515K 3.22M data01 59.7G 20.4T 36 11 522K 1.49M data01 59.7G 20.4T 33 24 508K 3.09M data01 59.7G 20.4T 35 23 536K 2.97M data01 59.7G 20.4T 32 23 483K 2.97M data01 59.7G 20.4T 37 37 538K 4.70M While writes are being committed to the ZIL all the time, periodic dumping to the pool still occurs, and during those times reads are starved. Maybe this doesn't happen in the 'real world' ? -Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] triple-parity: RAID-Z3
> which gap? > > 'RAID-Z should mind the gap on writes' ? > > Message was edited by: thometal I believe this is in reference to the raid 5 write hole, described here: http://en.wikipedia.org/wiki/Standard_RAID_levels#RAID_5_performance RAIDZ should avoid this via it's Copy on Write model: http://en.wikipedia.org/wiki/Zfs#Copy-on-write_transactional_model So I'm not sure what the 'RAID-Z should mind the gap on writes' comment is getting at either. Clarification? -Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS development moving behind closed doors
"I had already begun the process of migrating my 134 boxes over to Nexenta before Oracle's cunning plans became known. This just reaffirms my decision. " Us too. :) -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Configuration questions for Home File Server (CPU cores, dedup, checksum)?
Craig, 3. I do not think you will get much dedupe on video, music and photos. I would not bother. If you really wanted to know at some later stage, you could create a new file system, enable dedupe, and copy your data (or a subset) into it just to see. In my experience there is a significant CPU penalty as well. My four core (1.86GHz xeons, 4 yrs old) box nearly maxes out when putting a lot of data into a deduped file system. -Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedup relationship between pool and filesystem
Hi Peter, dedupe is pool wide. File systems can opt in or out of dedupe. So if multiple file systems are set to dedupe, then they all benefit from using the same pool of deduped blocks. In this way, if two files share some of the same blocks, even if they are in different file systems, they will dedupe. I am not sure why reporting is not done at the file system level. It may be an accounting issue, i.e. which file system owns the dedupe blocks. But it seems some fair estimate could be made. Maybe the overhead to keep a file system updated with these stats is too high? -Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Data transfer taking a longer time than expected (Possibly dedup related)
"Can I disable dedup on the dataset while the transfer is going on?" Yes. Only the blocks copied after disabling dedupe will not be deduped. The stuff you have already copied will be deduped. "Can I simply Ctrl-C the procress to stop it?" Yes, you can do that to a mv process. Maybe stop the process, delete the deduped file system (your copy target), and create a new file system without dedupe to see if that is any better? Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedup relationship between pool and filesystem
When I do the calculations, assuming 300bytes per block to be conservative, with 128K blocks, I get 2.34G of cache (RAM, L2ARC) per Terabyte of deduped data. But block size is dynamic, so you will need more than this. Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kernel panic on ZFS import - how do I recover?
I just realized that the email I sent to David and the list did not make the list (at least as jive can see it), so here is what I sent on the 23rd: Brilliant. I set those parameters via /etc/system, rebooted, and the pool imported with just the –f switch. I had seen this as an option earlier, although not that thread, but was not sure it applied to my case. Scrub is running now. Thank you very much! -Scott Update: The scrub finished with zero errors. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] My filesystem turned from a directory into a special character device
I am running nexenta CE 3.0.3. I have a file system that at some point in the last week went from a directory per 'ls -l' to a special character device. This results in not being able to get into the file system. Here is my file system, scott2, along with a new file system I just created, as seen by ls -l: drwxr-xr-x 4 root root4 Sep 27 09:14 scott crwxr-xr-x 9 root root 0, 0 Sep 20 11:51 scott2 Notice the 'c' vs. 'd' at the beginning of the permissions list. I had been fiddling with permissions last week, then had problems with a kernel panic. Perhaps this is related? Any ideas how to get access to my file system? Thanks, -Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] My filesystem turned from a directory into a special character device
On 9/27/10 9:56 AM, "Victor Latushkin" wrote: > > On Sep 27, 2010, at 8:30 PM, Scott Meilicke wrote: > >> I am running nexenta CE 3.0.3. >> >> I have a file system that at some point in the last week went from a >> directory per 'ls -l' to a special character device. This results in not >> being able to get into the file system. Here is my file system, scott2, along >> with a new file system I just created, as seen by ls -l: >> >> drwxr-xr-x 4 root root4 Sep 27 09:14 scott >> crwxr-xr-x 9 root root 0, 0 Sep 20 11:51 scott2 >> >> Notice the 'c' vs. 'd' at the beginning of the permissions list. I had been >> fiddling with permissions last week, then had problems with a kernel panic. > > Are you still running with aok/zfs_recover being set? Have you seen this issue > before panic? Yes. Well, I have removed those entries in /etc/system, but have not yet rebooted the box. > >> Perhaps this is related? > > May be. > >> Any ideas how to get access to my file system? > > This can be fixed, but it is a bit more complicated and error prone that > setting couple of variables. OK. Sounds like restoring from my backup would be best? What causes this? I saw this exact same behavior on my home box, and had to restore about two weeks ago. Not very encouraging. :( Is there anything I can provide to help people who know more than me solve this problem? > > Regards > Victor Thanks Victor. -Scott We value your opinion! How may we serve you better? Please click the survey link to tell us how we are doing: http://www.craneae.com/ContactUs/VoiceofCustomer.aspx Your feedback is of the utmost importance to us. Thank you for your time. Crane Aerospace & Electronics Confidentiality Statement: The information contained in this email message may be privileged and is confidential information intended only for the use of the recipient, or any employee or agent responsible to deliver it to the intended recipient. Any unauthorized use, distribution or copying of this information is strictly prohibited and may be unlawful. If you have received this communication in error, please notify the sender immediately and destroy the original message and all attachments from your electronic files. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] When Zpool has no space left and no snapshots
Preemptively use quotas? On 9/22/10 7:25 PM, "Aleksandr Levchuk" wrote: > Dear ZFS Discussion, > > I ran out of space, consequently could not rm or truncate files. (It > make sense because it's a copy-on-write and any transaction needs to > be written to disk. It worked out really well - all I had to do is > destroy some snapshots.) > > If there are no snapshots to destroy, how to prepare for a situation > when a ZFS pool looses it's last free byte? > > Alex > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Scott Meilicke | Enterprise Systems Administrator | Crane Aerospace & Electronics | +1 425-743-8153 | M: +1 206-406-2670 We value your opinion! How may we serve you better? Please click the survey link to tell us how we are doing: http://www.craneae.com/ContactUs/VoiceofCustomer.aspx Your feedback is of the utmost importance to us. Thank you for your time. Crane Aerospace & Electronics Confidentiality Statement: The information contained in this email message may be privileged and is confidential information intended only for the use of the recipient, or any employee or agent responsible to deliver it to the intended recipient. Any unauthorized use, distribution or copying of this information is strictly prohibited and may be unlawful. If you have received this communication in error, please notify the sender immediately and destroy the original message and all attachments from your electronic files. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Is there any way to stop a resilver?
Has it been running long? Initially the numbers are way off. After a while it settles down into something reasonable. How many disks, and what size, are in your raidz2? -Scott On 9/29/10 8:36 AM, "LIC mesh" wrote: > Is there any way to stop a resilver? > > We gotta stop this thing - at minimum, completion time is 300,000 hours, and > maximum is in the millions. > > Raidz2 array, so it has the redundancy, we just need to get data off. We value your opinion! How may we serve you better? Please click the survey link to tell us how we are doing: http://www.craneae.com/ContactUs/VoiceofCustomer.aspx Your feedback is of the utmost importance to us. Thank you for your time. Crane Aerospace & Electronics Confidentiality Statement: The information contained in this email message may be privileged and is confidential information intended only for the use of the recipient, or any employee or agent responsible to deliver it to the intended recipient. Any unauthorized use, distribution or copying of this information is strictly prohibited and may be unlawful. If you have received this communication in error, please notify the sender immediately and destroy the original message and all attachments from your electronic files. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Is there any way to stop a resilver?
What version of OS? Are snapshots running (turn them off). So are there eight disks? On 9/29/10 8:46 AM, "LIC mesh" wrote: > It's always running less than an hour. > > It usually starts at around 300,000h estimate(at 1m in), goes up to an > estimate in the millions(about 30mins in) and restarts. > > Never gets past 0.00% completion, and K resilvered on any LUN. > > 64 LUNs, 32x5.44T, 32x10.88T in 8 vdevs. > > > > > On Wed, Sep 29, 2010 at 11:40 AM, Scott Meilicke > wrote: >> Has it been running long? Initially the numbers are way off. After a while it >> settles down into something reasonable. >> >> How many disks, and what size, are in your raidz2? >> >> -Scott >> >> >> On 9/29/10 8:36 AM, "LIC mesh" http://licm...@gmail.com> >> > wrote: >> >>> Is there any way to stop a resilver? >>> >>> We gotta stop this thing - at minimum, completion time is 300,000 hours, and >>> maximum is in the millions. >>> >>> Raidz2 array, so it has the redundancy, we just need to get data off. We value your opinion! How may we serve you better? Please click the survey link to tell us how we are doing: http://www.craneae.com/ContactUs/VoiceofCustomer.aspx Your feedback is of the utmost importance to us. Thank you for your time. Crane Aerospace & Electronics Confidentiality Statement: The information contained in this email message may be privileged and is confidential information intended only for the use of the recipient, or any employee or agent responsible to deliver it to the intended recipient. Any unauthorized use, distribution or copying of this information is strictly prohibited and may be unlawful. If you have received this communication in error, please notify the sender immediately and destroy the original message and all attachments from your electronic files. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Fwd: Is there any way to stop a resilver?
(I left the list off last time sorry) No, the resliver should only be happening if there was a spare available. Is the whole thing scrubbing? It looks like it. Can you stop it with a zpool scrub s So... Word of warning, I am no expert at this stuff. Think about what I am suggesting before you do it :). Although stopping a scrub is pretty innocuous. -Scott On 9/29/10 9:22 AM, "LIC mesh" wrote: > You almost have it - each iSCSI target is made up of 4 of the raidz vdevs - 4 > * 6 = 24 disks. > > 16 targets total. > > We have one LUN with status of "UNAVAIL" but didn't know if removing it > outright would help - it's actually available and well as far as the target is > concerned, so we thought it went UNAVAIL as a result of iSCSI timeouts - we've > since fixed the switches buffers, etc. > > See: > http://pastebin.com/pan9DBBS > > > > On Wed, Sep 29, 2010 at 12:17 PM, Scott Meilicke > wrote: >> OK, let me see if I have this right: >> >> 8 shelves, 1T disks, 24 disks per shelf = 192 disks >> 8 shelves, 2T disks, 24 disks per shelf = 192 disks >> Each raidz is six disks. >> 64 raidz vdevs >> Each iSCSI target is made up of 8 of these raidz vdevs (8 x 6 disks = 48 >> disks) >> Then the head takes these eight targets, and makes a raidz2. So the raidz2 >> depends upon all 384 disks. So when a failure occurs, the resliver is >> accessing all 384 disks. >> >> If I have this right, which I am in serious doubt :), then that will either >> take an enormous amount of time to complete, or never. It looks like never. >> >> Recovery: >> >> From the head, can you see which vdev has failed? If so, can you remove it to >> stop the resliver? >> >> >> >> On 9/29/10 8:57 AM, "LIC mesh" http://licm...@gmail.com> >> > wrote: >> >>> This is an iSCSI/COMSTAR array. >>> >>> The head was running 2009.06 stable with version 14 ZFS, but we updated that >>> to build 134 (kept the old OS drives) - did not, however, update the zpool - >>> it's still version 14. >>> >>> The targets are all running 2009.06 stable, exporting 4 raidz1 LUNs each of >>> 6 drives - 8 shelves have 1TB drives, the other 8 have 2TB drives. >>> >>> The head sees the filesystem as comprised of 8 vdevs of 8 iSCSI LUNs each, >>> with SSD ZIL and SSD L2ARC. >>> >>> >>> >>> On Wed, Sep 29, 2010 at 11:49 AM, Scott Meilicke >>> >> <http://scott.meili...@craneaerospace.com> > wrote: >>>> What version of OS? >>>> Are snapshots running (turn them off). >>>> >>>> So are there eight disks? >>>> >>>> >>>> >>>> On 9/29/10 8:46 AM, "LIC mesh" >>> <http://licm...@gmail.com> <http://licm...@gmail.com> > wrote: >>>> >>>>> It's always running less than an hour. >>>>> >>>>> It usually starts at around 300,000h estimate(at 1m in), goes up to an >>>>> estimate in the millions(about 30mins in) and restarts. >>>>> >>>>> Never gets past 0.00% completion, and K resilvered on any LUN. >>>>> >>>>> 64 LUNs, 32x5.44T, 32x10.88T in 8 vdevs. >>>>> >>>>> >>>>> >>>>> >>>>> On Wed, Sep 29, 2010 at 11:40 AM, Scott Meilicke >>>>> >>>> <http://scott.meili...@craneaerospace.com> >>>>> <http://scott.meili...@craneaerospace.com> > wrote: >>>>>> Has it been running long? Initially the numbers are way off. After a >>>>>> while it settles down into something reasonable. >>>>>> >>>>>> How many disks, and what size, are in your raidz2? >>>>>> >>>>>> -Scott >>>>>> >>>>>> >>>>>> On 9/29/10 8:36 AM, "LIC mesh" >>>>> <http://licm...@gmail.com> <http://licm...@gmail.com> >>>>>> <http://licm...@gmail.com> > wrote: >>>>>> >>>>>>> Is there any way to stop a resilver? >>>>>>> >>>>>>> We gotta stop this thing - at minimum, completion time is 300,000 hours, >>>>>>> and maximum is in the millions. >>>>>>> >>>>>>> Raidz2 array,
[zfs-discuss] Resliver making the system unresponsive
This must be resliver day :) I just had a drive failure. The hot spare kicked in, and access to the pool over NFS was effectively zero for about 45 minutes. Currently the pool is still reslivering, but for some reason I can access the file system now. Resliver speed has been beaten to death I know, but is there a way to avoid this? For example, is more enterprisy hardware less susceptible to reslivers? This box is used for development VMs, but there is no way I would consider this for production with this kind of performance hit during a resliver. My hardware: Dell 2950 16G ram 16 disk SAS chassis LSI 3801 (I think) SAS card (1068e chip) Intel x25-e SLOG off of the internal PERC 5/i RAID controller Seagate 750G disks (7200.11) I am running Nexenta CE 3.0.3 (SunOS rawhide 5.11 NexentaOS_134f i86pc i386 i86pc Solaris) pool: data01 state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Wed Sep 29 14:03:52 2010 1.12T scanned out of 5.00T at 311M/s, 3h37m to go 82.0G resilvered, 22.42% done config: NAME STATE READ WRITE CKSUM data01 DEGRADED 0 0 0 raidz2-0 ONLINE 0 0 0 c1t8d0 ONLINE 0 0 0 c1t9d0 ONLINE 0 0 0 c1t10d0ONLINE 0 0 0 c1t11d0ONLINE 0 0 0 c1t12d0ONLINE 0 0 0 c1t13d0ONLINE 0 0 0 c1t14d0ONLINE 0 0 0 raidz2-1 DEGRADED 0 0 0 c1t22d0ONLINE 0 0 0 c1t15d0ONLINE 0 0 0 c1t16d0ONLINE 0 0 0 c1t17d0ONLINE 0 0 0 c1t23d0ONLINE 0 0 0 spare-5REMOVED 0 0 0 c1t20d0 REMOVED 0 0 0 c8t18d0 ONLINE 0 0 0 (resilvering) c1t21d0ONLINE 0 0 0 logs c0t1d0 ONLINE 0 0 0 spares c8t18d0 INUSE currently in use errors: No known data errors Thanks for any insights. -Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Resliver making the system unresponsive
I should add I have 477 snapshots across all files systems. Most of them are hourly snaps (225 of them anyway). On Sep 29, 2010, at 3:16 PM, Scott Meilicke wrote: > This must be resliver day :) > > I just had a drive failure. The hot spare kicked in, and access to the pool > over NFS was effectively zero for about 45 minutes. Currently the pool is > still reslivering, but for some reason I can access the file system now. > > Resliver speed has been beaten to death I know, but is there a way to avoid > this? For example, is more enterprisy hardware less susceptible to reslivers? > This box is used for development VMs, but there is no way I would consider > this for production with this kind of performance hit during a resliver. > > My hardware: > Dell 2950 > 16G ram > 16 disk SAS chassis > LSI 3801 (I think) SAS card (1068e chip) > Intel x25-e SLOG off of the internal PERC 5/i RAID controller > Seagate 750G disks (7200.11) > > I am running Nexenta CE 3.0.3 (SunOS rawhide 5.11 NexentaOS_134f i86pc i386 > i86pc Solaris) > > pool: data01 > state: DEGRADED > status: One or more devices is currently being resilvered. The pool will > continue to function, possibly in a degraded state. > action: Wait for the resilver to complete. > scan: resilver in progress since Wed Sep 29 14:03:52 2010 >1.12T scanned out of 5.00T at 311M/s, 3h37m to go >82.0G resilvered, 22.42% done > config: > > NAME STATE READ WRITE CKSUM > data01 DEGRADED 0 0 0 > raidz2-0 ONLINE 0 0 0 > c1t8d0 ONLINE 0 0 0 > c1t9d0 ONLINE 0 0 0 > c1t10d0ONLINE 0 0 0 > c1t11d0ONLINE 0 0 0 > c1t12d0ONLINE 0 0 0 > c1t13d0ONLINE 0 0 0 > c1t14d0ONLINE 0 0 0 > raidz2-1 DEGRADED 0 0 0 > c1t22d0ONLINE 0 0 0 > c1t15d0ONLINE 0 0 0 > c1t16d0ONLINE 0 0 0 > c1t17d0ONLINE 0 0 0 > c1t23d0ONLINE 0 0 0 > spare-5REMOVED 0 0 0 > c1t20d0 REMOVED 0 0 0 > c8t18d0 ONLINE 0 0 0 (resilvering) > c1t21d0ONLINE 0 0 0 > logs > c0t1d0 ONLINE 0 0 0 > spares > c8t18d0 INUSE currently in use > > errors: No known data errors > > Thanks for any insights. > > -Scott > -- > This message posted from opensolaris.org > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss Scott Meilicke ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] When is it okay to turn off the "verify" option.
Why do you want to turn verify off? If performance is the reason, is it significant, on and off? On Oct 4, 2010, at 2:28 PM, Edward Ned Harvey wrote: >> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- >> boun...@opensolaris.org] On Behalf Of Peter Taps >> >> As I understand, the hash generated by sha256 is "almost" guaranteed >> not to collide. I am thinking it is okay to turn off "verify" property >> on the zpool. However, if there is indeed a collision, we lose data. >> "Scrub" cannot recover such lost data. >> >> I am wondering in real life when is it okay to turn off "verify" >> option? I guess for storing business critical data (HR, finance, etc.), >> you cannot afford to turn this option off. > > Right on all points. It's a calculated risk. If you have a hash collision, > you will lose data undetected, and backups won't save you unless *you* are > the backup. That is, if the good data, before it got corrupted by your > system, happens to be saved somewhere else before it reached your system. > > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss Scott Meilicke ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Finding corrupted files
Scrub? On Oct 6, 2010, at 6:48 AM, Stephan Budach wrote: > No - not a trick question., but maybe I didn't make myself clear. > Is there a way to discover such bad files other than trying to actually read > from them one by one, say using cp or by sending a snapshot elsewhere? > > I am well aware that the file shown in zpool status -v is damaged and I have > already restored it, but I wanted to know, if there're more of them. > > Regards, > budy > -- > This message posted from opensolaris.org > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss Scott Meilicke ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [RFC] Backup solution
Those must be pretty busy drives. I had a recent failure of a 1.5T disks in a 7 disk raidz2 vdev that took about 16 hours to resliver. There was very little IO on the array, and it had maybe 3.5T of data to resliver. On Oct 7, 2010, at 3:17 PM, Ian Collins wrote: > I would seriously consider raidz3, given I typically see 80-100 hour resilver > times for 500G drives in raidz2 vdevs. If you haven't already, read Adam > Leventhal's paper: > > http://queue.acm.org/detail.cfm?id=1670144 > > -- > Ian. > > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss Scott Meilicke ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss