Re: [zfs-discuss] Cores vs. Speed?
> I am leaning towards AMD because of ECC support well, lets look at Intel's offerings... Ram is faster than AMD's at 1333Mhz DDR3 and one gets ECC and thermal sensor for $10 over non-ECC http://www.newegg.com/Product/Product.aspx?Item=N82E16820139040 This MB has two Intel ethernets and for an extra $30 an ether KVM (LOM) http://www.newegg.com/Product/Product.aspx?Item=N82E16813182212 One needs a Xeon 34xx for ECC, the 45W versions isn't on newegg, and ignoring the one without Hyper-Threading leaves us http://www.newegg.com/Product/Product.aspx?Item=N82E16819117225 Yea @ 95W it isn't exactly low power, but 4 cores @ 2533MHz and another 4 Hyper-Thread cores is nice.. If you only need one core, the marketing paperwork claims it will push to 2.93GHz too. But the ram bandwidth is the big win for Intel. Avoid the temptation, but @ 2.8Ghz without ECC, this close $$ http://www.newegg.com/Product/Product.aspx?Item=N82E16819115214 Now, this gets one to 8G ECC easily...AMD's unfair advantage is all those ram slots on their multi-die MBs... A slow AMD cpu with 64G ram might be better depending on your working set / dedup requirements. Rob ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Cores vs. Speed?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 02/05/2010 03:21 AM, Edward Ned Harvey wrote: > FWIW ... 5 disks in raidz2 will have capacity of 3 disks. But if you bought > 6 disks in mirrored configuration, you have a small extra cost, and much > better performance. But the raidz2 can survive the lost of ANY two disk, while the 6 disk mirror configuration will be destroyed if the two disks lost are from the SAME pair. - -- Jesus Cea Avion _/_/ _/_/_/_/_/_/ j...@jcea.es - http://www.jcea.es/ _/_/_/_/ _/_/_/_/ _/_/ jabber / xmpp:j...@jabber.org _/_/_/_/ _/_/_/_/_/ . _/_/ _/_/_/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/_/_/ _/_/_/_/ _/_/ "My name is Dump, Core Dump" _/_/_/_/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQCVAwUBS2ukAZlgi5GaxT1NAQKD6wQAjI7zTFGmsHKtrhfSGS65edDecxwG8MSV rDsxoDD0OFs5A1rAJBKZ0UWcRrrDt8iTUKyM0W13+3D2S3i6pxaMLU5jCLFEIPJ7 ZukQxUQ3eRLksXNCjsc7IlIyoe3GTwNclV8pymYCkHp+jggHASRyRtVnninDDX+g zs1X2Rd4qwU= =qzs+ -END PGP SIGNATURE- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Cores vs. Speed?
Brian wrote: Interesting comments.. But I am confused. Performance for my backups (compression/deduplication) would most likely not be #1 priority. I want my VMs to run fast - so is it deduplication that really slows things down? Dedup requires a fair amount of CPU, but it really wants a big L2ARC and RAM. I'd seriously consider no less than 8GB of RAM, and look at getting a smaller-sized (~40GB) SSD, something on the order of an Intel X25-M. Also, iSCSI-served VMs tend to do mostly random I/O, which is better handled by a striped mirror than RaidZ. Are you saying raidz2 would overwhelm current I/O controllers to where I could not saturate 1 GB network link? No. Is the CPU I am looking at not capable of doing dedup and compression? Or are no CPUs capable of doing that currently? If I only enable it for the backup filesystem will all my filesystems suffer performance wise? All the CPUs you indicate can handle the job, it's a matter of getting enough data to them. Where are the bottlenecks in a raidz2 system that I will only access over a single gigabit link? Are the insurmountable? RaidZ is good for streaming writes of large size, where you should get performance roughly equal to the number of data drives. Likewise, for streaming reads. Small writes generally limit performance to a level of about 1 disk, regardless of the number of data drives in the RaidZ. Small reads are in-between in terms of performance. Personally, I'd look into having 2 different zpools - a striped mirror for your iSCSI-shared VMs, and a raidz2 for your main storage. In any case, for dedup, you really should have an SSD for L2ARC, if at all possible. Being able to store all the metadata for the entire zpool in the L2ARC really, really helps speed up dedup. Also, about your CPU choices, look here for a good summary of the current AMD processor features: http://en.wikipedia.org/wiki/List_of_AMD_Phenom_microprocessors (this covers the Phenom, Phenom II, and Athlon II families). The main difference between the various models comes down to amount of L3 cache, and HT speed. I'd be interested in doing some benchmarking to see exactly how the variations make a difference. -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Cores vs. Speed?
> I want my VMs to run fast - so is it deduplication that really slows > things down? > > Are you saying raidz2 would overwhelm current I/O controllers to where > I could not saturate 1 GB network link? > > Is the CPU I am looking at not capable of doing dedup and compression? > Or are no CPUs capable of doing that currently? If I only enable it > for the backup filesystem will all my filesystems suffer performance > wise? > > Where are the bottlenecks in a raidz2 system that I will only access > over a single gigabit link? Are the insurmountable? I'm not sure if anybody can answer your questions. I will suggest you just try things out, and see for yourself. Everybody would have different techniques to tweak performance... If you want to use fast compression and dedup, lots of cpu and ram. (You said 4G, but I don't think that's a lot. I never buy a laptop with less than 4G nowadays. I think a lot of ram is 16G and higher.) As for raidz2, and Ethernet ... I don't know. If you've got 5 disks in a raidz2 configuration ... Assuming each disk can sustain 500Mbits, then theoretically these disks might be able to achieve 1.5Gbit or 2.5Gbit with perfect efficiency ... So maybe they can max out your Ethernet. I don't know. But I do know, if you had a stripe of 3 mirrors, they would have absolutely no trouble maxing out the Ethernet. Even a single mirror could just barely do that. For 2 or more mirrors, it's cake. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Cores vs. Speed?
> Data in raidz2 is striped so that it is split across multiple disks. Partial truth. Yes, the data is on more than one disk, but it's a parity hash, requiring computation overhead and a write operation on each and every disk. It's not simply striped. Whenever you read or write, you need to access all the disks (or a bunch of 'em) and use compute cycles to generate the actual data stream. I don't know enough about the underlying methods of calculating and distributing everything to say intelligently *why*, but I know this: > In this (sequential) sense it is faster than a single disk. Whenever I benchmark raid5 versus a mirror, the mirror is always faster. Noticeably and measurably faster, as in 50% to 4x faster. (50% for a single disk mirror versus a 6-disk raid5, and 4x faster for a stripe of mirrors, 6 disks with the capacity of 3, versus a 6-disk raid5.) Granted, I'm talking about raid5 and not raidz. There is possibly a difference there, but I don't think so. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Impact of an enterprise class SSD on ZIL performance
On Thu, Feb 4, 2010 at 10:35 PM, Bob Friesenhahn < bfrie...@simple.dallas.tx.us> wrote: > On Thu, 4 Feb 2010, Marc Nicholas wrote: > >> >> The write IOPS between the X25-M and the X25-E are different since with >> the X25-M, much >> more of your data gets completely lost. Most of us prefer not to lose our >> data. >> >> Would you like to qualify your statement further? >> > > Google is your friend. And check earlier on this list/forum as well. > > While I understand the difference between MLC and SLC parts, I'm pretty >> sure Intel didn't >> design the M version to make "data get completely lost". ;) >> > > It loses the most recently written data, even after a cache sync request. > A number of people have verified this for themselves and posted results. > Even the X25-E has been shown to lose some transactions. > > The devices have some DRAM (16MB) that is used for write amplification levelling. The sudden loss of power means that this DRAM doesn't get flushed to Flash. This is the very reason the STEC devices have a supercap. -marc ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Impact of an enterprise class SSD on ZIL performance
On Thu, 4 Feb 2010, Marc Nicholas wrote: The write IOPS between the X25-M and the X25-E are different since with the X25-M, much more of your data gets completely lost. Most of us prefer not to lose our data. Would you like to qualify your statement further? Google is your friend. And check earlier on this list/forum as well. While I understand the difference between MLC and SLC parts, I'm pretty sure Intel didn't design the M version to make "data get completely lost". ;) It loses the most recently written data, even after a cache sync request. A number of people have verified this for themselves and posted results. Even the X25-E has been shown to lose some transactions. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Impact of an enterprise class SSD on ZIL performance
On Thu, Feb 4, 2010 at 10:18 PM, Bob Friesenhahn < bfrie...@simple.dallas.tx.us> wrote: > On Thu, 4 Feb 2010, Marc Nicholas wrote: > > Very interesting stats -- thanks for taking the time and trouble to share >> them! >> >> One thing I found interesting is that the Gen 2 X25-M has higher write >> IOPS than the >> X25-E according to Intel's documentation (6,600 IOPS for 4K writes versus >> 3,300 IOPS for >> 4K writes on the "E"). I wonder if it'd perform better as a ZIL? (The >> write latency on >> both drives is the same). >> > > The write IOPS between the X25-M and the X25-E are different since with the > X25-M, much more of your data gets completely lost. Most of us prefer not > to lose our data. > > Would you like to qualify your statement further? While I understand the difference between MLC and SLC parts, I'm pretty sure Intel didn't design the M version to make "data get completely lost". ;) -marc ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Impact of an enterprise class SSD on ZIL performance
On Thu, 4 Feb 2010, Marc Nicholas wrote: Very interesting stats -- thanks for taking the time and trouble to share them! One thing I found interesting is that the Gen 2 X25-M has higher write IOPS than the X25-E according to Intel's documentation (6,600 IOPS for 4K writes versus 3,300 IOPS for 4K writes on the "E"). I wonder if it'd perform better as a ZIL? (The write latency on both drives is the same). The write IOPS between the X25-M and the X25-E are different since with the X25-M, much more of your data gets completely lost. Most of us prefer not to lose our data. The X25-M is about as valuable as a paper weight for use as a zfs slog. Toilet paper would be a step up. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Cores vs. Speed?
On Thu, 4 Feb 2010, Brian wrote: Was my raidz2 performance comment above correct? That the write speed is that of the slowest disk? That is what I believe I have read. Data in raidz2 is striped so that it is split across multiple disks. In this (sequential) sense it is faster than a single disk. For random access, the stripe performance can not be faster than the slowest disk though. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Cores vs. Speed?
Interesting comments.. But I am confused. Performance for my backups (compression/deduplication) would most likely not be #1 priority. I want my VMs to run fast - so is it deduplication that really slows things down? Are you saying raidz2 would overwhelm current I/O controllers to where I could not saturate 1 GB network link? Is the CPU I am looking at not capable of doing dedup and compression? Or are no CPUs capable of doing that currently? If I only enable it for the backup filesystem will all my filesystems suffer performance wise? Where are the bottlenecks in a raidz2 system that I will only access over a single gigabit link? Are the insurmountable? > > I plan to start with 5 1.5 TB drives in a raidz2 > configuration and 2 > > mirrored boot drives. > > You want to use compression and deduplication and > raidz2. I hope you didn't > want to get any performance out of this system, > because all of those are > compute or IO intensive. > > FWIW ... 5 disks in raidz2 will have capacity of 3 > disks. But if you bought > 6 disks in mirrored configuration, you have a small > extra cost, and much > better performance. > > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discu > ss -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Cores vs. Speed?
> I plan to start with 5 1.5 TB drives in a raidz2 configuration and 2 > mirrored boot drives. You want to use compression and deduplication and raidz2. I hope you didn't want to get any performance out of this system, because all of those are compute or IO intensive. FWIW ... 5 disks in raidz2 will have capacity of 3 disks. But if you bought 6 disks in mirrored configuration, you have a small extra cost, and much better performance. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Cores vs. Speed?
On Thu, Feb 4, 2010 at 7:54 PM, Brian wrote: > It sounds like the consensus is more cores over clock speed. Surprising to > me since the difference in clocks speed was over 1Ghz. So, I will go with a > quad core. > Four cores @ 1.8Ghz = 7.2Ghz of threaded performance ([Open]Solaris is relatively decent in terms of threading). Two cores @ 3.1Ghz = 6.2Ghz :) Although you may find single threaded operations slower, as someone pointed out, but even those might wash out as sometimes its I/O that's the problem. I was leaning towards 4GB of ram - which hopefully should be enough for > dedup as I am only planning on dedupping my smaller file systems (backups > and VMs) > 4GB is a good start. > Was my raidz2 performance comment above correct? That the write speed is > that of the slowest disk? That is what I believe I have read. > You are sort-of-correct that its the write speed of the slowest disk. Mirrored drives will be faster, especially for random I/O. But you sacrifice storage for that performance boost. That said, I have a similar setup as far as number of spindles and can push 200MB/sec+ through it and saturate GigE for iSCSI so maybe I'm being harsh on raidz2 :) > Now on to the hard part of picking a motherboard that is supported and has > enough SATA ports! > I used an ASUS board (M4A785-M) which has six (6) SATA2 ports onboard and pretty decent Hypertransport throughput. Hope that helps. -marc ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Cores vs. Speed?
It sounds like the consensus is more cores over clock speed. Surprising to me since the difference in clocks speed was over 1Ghz. So, I will go with a quad core. I was leaning towards 4GB of ram - which hopefully should be enough for dedup as I am only planning on dedupping my smaller file systems (backups and VMs). Was my raidz2 performance comment above correct? That the write speed is that of the slowest disk? That is what I believe I have read. Now on to the hard part of picking a motherboard that is supported and has enough SATA ports! -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Cores vs. Speed?
Hi Brian, If you are considering testing dedup, particularly on large datasets, see the list of known issues, here: http://hub.opensolaris.org/bin/view/Community+Group+zfs/dedup Start with build 132. Thanks, Cindy On 02/04/10 16:19, Brian wrote: I am Starting to put together a home NAS server that will have the following roles: (1) Store TV recordings from SageTV over either iSCSI or CIFS. Up to 4 or 5 HD streams at a time. These will be streamed live to the NAS box during recording. (2) Playback TV (could be stream being recorded, could be others) to 3 or more extenders (3) Hold a music repository (4) Hold backups from windows machines, mac (time machine), linux. (5) Be an iSCSI target for several different Virtual Boxes. Function 4 will use compression and deduplication. Function 5 will use deduplication. I plan to start with 5 1.5 TB drives in a raidz2 configuration and 2 mirrored boot drives. I have been reading these forums off and on for about 6 months trying to figure out how to best piece together this system. I am first trying to select the CPU. I am leaning towards AMD because of ECC support and power consumption. For items such as de-dupliciation, compression, checksums etc. Is it better to get a faster clock speed or should I consider more cores? I know certain functions such as compression may run on multiple cores. I have so far narrowed it down to: AMD Phenom II X2 550 Black Edition Callisto 3.1GHz and AMD Phenom X4 9150e Agena 1.8GHz Socket AM2+ 65W Quad-Core As they are roughly the same price. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Cores vs. Speed?
Le 05/02/10 01:00, Brian a écrit : Thanks for the reply. Are cores better because of the compression/deduplication being mult-threaded or because of multiple streams? It is a pretty big difference in clock speed - so curious as to why core would be better. Glad to see your 4 core system is working well for you - so seems like I won't really have a bad choice. Why avoid large drives? Reliability reasons? My main thought on that is that there is a 3 year warranty and I am building raidz2 because I expect failure. Or are there other reasons to avoid large drives? I thought I understood the overhead.. The write and read speeds should be roughly that of the slowest disk? Thanks. From what I saw, ZFS scales terribly well with multiple cores. If you want to send/receive your filesystems through ssh to another machine, speed matters since ssh only uses one core (but then you can always use netcat). On Xeon E5520 running at 2.27 GHz we achieve around 70/80 MB/s ssh throughput. For dedup, you want lots of RAM and if possible a large and fast ssd for L2ARC. Someone on this list was asking about estimates on ram/cache needs based on blocksizes / fs size / estimated dedup ratio. Either I missed the answer or there was no really simple answer (other than more is better, which always stays true for ram and l2arc). Anyway, we tested it and were surprised about the quantity of reads that ensue. Arnaud ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Mounting a snapshot of an iSCSI volume using Windows
I have a single zfs volume, shared out using COMSTAR and connected to a Windows VM. I am taking snapshots of the volume regularly. I now want to mount a previous snapshot, but when I go through the process, Windows sees the new volume, but thinks it is blank and wants to initialize it. Any ideas how to get Windows to see that it has data on it? Steps I took after the snap: zfs clone data01/san/gallardo/g-recovery sbdadm create-lu /dev/zvol/rdsk/data01/san/gallardo/g-recovery stmfadm add-view -h HG-Gallardo -t TG-Gallardo -n 1 600144F0EAE40A004B6B59090003 At this point, my server Gallardo can see the LUN, but like I said, it looks blank to the OS. I suspect the 'sbdadm create-lu' phase. Any help to get Windows to see it as a LUN with NTFS data would be appreciated. Thanks, Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Impact of an enterprise class SSD on ZIL performance
Peter Radig wrote: I was interested in the impact the type of an SSD has on the performance of the ZIL. So I did some benchmarking and just want to share the results. My test case is simply untarring the latest ON source (528 MB, 53k files) on an Linux system that has a ZFS file system mounted via NFS over gigabit ethernet. I got the following results: - remotely with no dedicated ZIL device: 36 min 37 sec (factor 73 compared to local) - remotely with an Intel X25-E 32 GB as ZIL device: 3 min 11 sec (factor 6.4 compared to local) That's about the same ratio I get when I demonstrate this on the SSD/Flash/Turbocharge Discovery Days I run the UK from time to time (the name changes over time;-). -- Andrew Gabriel ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Cores vs. Speed?
Put your money into RAM, especially for dedup. -- richard On Feb 4, 2010, at 3:19 PM, Brian wrote: > I am Starting to put together a home NAS server that will have the following > roles: > > (1) Store TV recordings from SageTV over either iSCSI or CIFS. Up to 4 or 5 > HD streams at a time. These will be streamed live to the NAS box during > recording. > (2) Playback TV (could be stream being recorded, could be others) to 3 or > more extenders > (3) Hold a music repository > (4) Hold backups from windows machines, mac (time machine), linux. > (5) Be an iSCSI target for several different Virtual Boxes. > > Function 4 will use compression and deduplication. > Function 5 will use deduplication. > > I plan to start with 5 1.5 TB drives in a raidz2 configuration and 2 mirrored > boot drives. > > I have been reading these forums off and on for about 6 months trying to > figure out how to best piece together this system. > > I am first trying to select the CPU. I am leaning towards AMD because of ECC > support and power consumption. > > For items such as de-dupliciation, compression, checksums etc. Is it better > to get a faster clock speed or should I consider more cores? I know certain > functions such as compression may run on multiple cores. > > I have so far narrowed it down to: > > AMD Phenom II X2 550 Black Edition Callisto 3.1GHz > and > AMD Phenom X4 9150e Agena 1.8GHz Socket AM2+ 65W Quad-Core > > As they are roughly the same price. > -- > This message posted from opensolaris.org > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Impact of an enterprise class SSD on ZIL performance
Very interesting stats -- thanks for taking the time and trouble to share them! One thing I found interesting is that the Gen 2 X25-M has higher write IOPS than the X25-E according to Intel's documentation (6,600 IOPS for 4K writes versus 3,300 IOPS for 4K writes on the "E"). I wonder if it'd perform better as a ZIL? (The write latency on both drives is the same). -marc On Thu, Feb 4, 2010 at 6:43 PM, Peter Radig wrote: > I was interested in the impact the type of an SSD has on the performance of > the ZIL. So I did some benchmarking and just want to share the results. > > My test case is simply untarring the latest ON source (528 MB, 53k files) > on an Linux system that has a ZFS file system mounted via NFS over gigabit > ethernet. > > I got the following results: > - locally on the Solaris box: 30 sec > - remotely with no dedicated ZIL device: 36 min 37 sec (factor 73 compared > to local) > - remotely with ZIL disabled: 1 min 54 sec (factor 3.8 compared to local) > - remotely with a OCZ VERTEX SATA II 120 GB as ZIL device: 14 min 40 sec > (factor 29.3 compared to local) > - remotely with an Intel X25-E 32 GB as ZIL device: 3 min 11 sec (factor > 6.4 compared to local) > > So it really makes a difference what type of SSD you use for your ZIL > device. I was expecting a good performance from the X25-E, but was really > suprised that it is that good (only 1.7 times slower than it takes with ZIL > completely disabled). So I will use the X25-E as ZIL device on my box and > will not consider disabling ZIL at all to improve NFS performance. > > -- Peter > -- > This message posted from opensolaris.org > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Cores vs. Speed?
Thanks for the reply. Are cores better because of the compression/deduplication being mult-threaded or because of multiple streams? It is a pretty big difference in clock speed - so curious as to why core would be better. Glad to see your 4 core system is working well for you - so seems like I won't really have a bad choice. Why avoid large drives? Reliability reasons? My main thought on that is that there is a 3 year warranty and I am building raidz2 because I expect failure. Or are there other reasons to avoid large drives? I thought I understood the overhead.. The write and read speeds should be roughly that of the slowest disk? Thanks. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Best 1.5TB drives for consumer RAID?
Le 04/02/10 20:26, Tonmaus a écrit : Hi again, thanks for the answer. Another thing that came to my mind is that you mentioned that you mixed the disks among the controllers. Does that mean you mixed them as well among pools? Unsurprisingly, the WD20EADS is slower than the Hitachi that is a fixed 7200 rpm drive. I wonder what impact that would have if you use them as vdevs of the same pool. Cheers, Tonmaus Yes, we mixed them among controllers and pools. We've done something that's not recommended : a 15 disk raidz3 pool. Disks are as follows : c3 (LSI SAS) has : - 1x 64 GB Intel X25E - 3 x 2TB WD20EADS - 4 x 2TB Hitachi c2 (LSI SAS) has : - 4 x 2TB WD20EADS - 4 x 2TB Hitachi c5 (motherboard ICH10 if I remember well) has : - 1x160GB 2,5'' WD - DVD All the 2TB drivers are in the raidz3 zpool named tank (we've been very innovative here ;-). X25E is sliced in 20GB for the system, 1GB for ZIL for tank, the rest as cache for tank. The 2,5'' 160GB WD was not initially part of the setup since we were planning to slice the 2TB drives in 32GB for the system (mirrored accross all drives) and the rest for the big zpool, while the X25E was just there for the ZIL and the cache, but two things we've read on lists and forums made us change our minds : - the disk write cache is disabled when you're not using the whole drive - some reports on this list about X25E loosing up to 256 cache flushes in case of power failures. So we bought this 160GB disk (it was really the last thing that could fit in the chassis) and sliced it in the same way as the X25E. The system and the ZIL are mirrored between the X25E and the WD160. We do not use the WD160 for the cache : we thought it would be better to save IOPS on this disk for the ZIL mirror. I don't know wether it's a good idea to mirror the ZIL on such a disk but we prefer having slower setup and not loose that much cache flushes on power failure. Regarding the perfs obtained by using only Hitachi disks, I can't tell, I haven't tested it, and can't do it right now as the system is in preproduction testing. Also, I should have mentionned in my previous post that some WD20EADS (the 32SB0) have shorter reponse times (as reported by iostat). They're even "faster" than the Hitachi : I've seen them quite a few times in the range 0.3 to 1.5 ms, which seems far to short for this kind of drives. I suspect they're sort of dropping flush requests. Add to it that 2 out of 3 failed WD20EADS were 32SB0 and you get the picture... Note they might also be hybrid drives with some flash memory which allows quick acknoledgment of writes, but I think we would have heard of such a feature on this list. Arnaud ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Cores vs. Speed?
I would go with cores (threads) rather than clock speed here. My home system is a 4-core AMD @ 1.8Ghz and performs well. I wouldn't use drives that big and you should be aware of the overheads of RaidZ[x]. -marc On Thu, Feb 4, 2010 at 6:19 PM, Brian wrote: > I am Starting to put together a home NAS server that will have the > following roles: > > (1) Store TV recordings from SageTV over either iSCSI or CIFS. Up to 4 or > 5 HD streams at a time. These will be streamed live to the NAS box during > recording. > (2) Playback TV (could be stream being recorded, could be others) to 3 or > more extenders > (3) Hold a music repository > (4) Hold backups from windows machines, mac (time machine), linux. > (5) Be an iSCSI target for several different Virtual Boxes. > > Function 4 will use compression and deduplication. > Function 5 will use deduplication. > > I plan to start with 5 1.5 TB drives in a raidz2 configuration and 2 > mirrored boot drives. > > I have been reading these forums off and on for about 6 months trying to > figure out how to best piece together this system. > > I am first trying to select the CPU. I am leaning towards AMD because of > ECC support and power consumption. > > For items such as de-dupliciation, compression, checksums etc. Is it > better to get a faster clock speed or should I consider more cores? I know > certain functions such as compression may run on multiple cores. > > I have so far narrowed it down to: > > AMD Phenom II X2 550 Black Edition Callisto 3.1GHz > and > AMD Phenom X4 9150e Agena 1.8GHz Socket AM2+ 65W Quad-Core > > As they are roughly the same price. > -- > This message posted from opensolaris.org > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Impact of an enterprise class SSD on ZIL performance
I was interested in the impact the type of an SSD has on the performance of the ZIL. So I did some benchmarking and just want to share the results. My test case is simply untarring the latest ON source (528 MB, 53k files) on an Linux system that has a ZFS file system mounted via NFS over gigabit ethernet. I got the following results: - locally on the Solaris box: 30 sec - remotely with no dedicated ZIL device: 36 min 37 sec (factor 73 compared to local) - remotely with ZIL disabled: 1 min 54 sec (factor 3.8 compared to local) - remotely with a OCZ VERTEX SATA II 120 GB as ZIL device: 14 min 40 sec (factor 29.3 compared to local) - remotely with an Intel X25-E 32 GB as ZIL device: 3 min 11 sec (factor 6.4 compared to local) So it really makes a difference what type of SSD you use for your ZIL device. I was expecting a good performance from the X25-E, but was really suprised that it is that good (only 1.7 times slower than it takes with ZIL completely disabled). So I will use the X25-E as ZIL device on my box and will not consider disabling ZIL at all to improve NFS performance. -- Peter -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Cores vs. Speed?
* Brian (broco...@vt.edu) wrote: > I am Starting to put together a home NAS server that will have the > following roles: > > (1) Store TV recordings from SageTV over either iSCSI or CIFS. Up to > 4 or 5 HD streams at a time. These will be streamed live to the NAS > box during recording. (2) Playback TV (could be stream being > recorded, could be others) to 3 or more extenders (3) Hold a music > repository (4) Hold backups from windows machines, mac (time machine), > linux. (5) Be an iSCSI target for several different Virtual Boxes. > > Function 4 will use compression and deduplication. Function 5 will > use deduplication. > > I plan to start with 5 1.5 TB drives in a raidz2 configuration and 2 > mirrored boot drives. > > I have been reading these forums off and on for about 6 months trying > to figure out how to best piece together this system. > > I am first trying to select the CPU. I am leaning towards AMD because > of ECC support and power consumption. I can't comment on most of your question, but I will point you at: http://blogs.sun.com/mhaywood/entry/powernow_for_solaris I *think* the cpu's you're looking at won't be an issue but just something to be aware of when looking at AMD kit (especially if you want to manage the processor speed). Cheers, -- Glenn ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Cores vs. Speed?
I am Starting to put together a home NAS server that will have the following roles: (1) Store TV recordings from SageTV over either iSCSI or CIFS. Up to 4 or 5 HD streams at a time. These will be streamed live to the NAS box during recording. (2) Playback TV (could be stream being recorded, could be others) to 3 or more extenders (3) Hold a music repository (4) Hold backups from windows machines, mac (time machine), linux. (5) Be an iSCSI target for several different Virtual Boxes. Function 4 will use compression and deduplication. Function 5 will use deduplication. I plan to start with 5 1.5 TB drives in a raidz2 configuration and 2 mirrored boot drives. I have been reading these forums off and on for about 6 months trying to figure out how to best piece together this system. I am first trying to select the CPU. I am leaning towards AMD because of ECC support and power consumption. For items such as de-dupliciation, compression, checksums etc. Is it better to get a faster clock speed or should I consider more cores? I know certain functions such as compression may run on multiple cores. I have so far narrowed it down to: AMD Phenom II X2 550 Black Edition Callisto 3.1GHz and AMD Phenom X4 9150e Agena 1.8GHz Socket AM2+ 65W Quad-Core As they are roughly the same price. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to get a list of changed files between two snapshots?
Hi Ross, Yes - zdb - is dumping out info in the form of: Object lvl iblk dblk dsize lsize %full type 19116K512512512 100.00 ZFS plain file 264 bonus ZFS znode dnode flags: USED_BYTES USERUSED_ACCOUNTED dnode maxblkid: 0 path/snapshot.sh uid 0 gid 0 atime Thu Feb 4 23:04:50 2010 mtime Thu Feb 4 23:04:50 2010 ctime Thu Feb 4 23:04:50 2010 crtime Thu Feb 4 23:04:50 2010 gen 529806 mode100755 size174 parent 3 links xattr 0 rdev0x for all objects referenced in the snap. Perhaps if you wanted to script this, then parsing the above output for time stamps that are after the previous snapshot. Deleted files (and of course new files) can be diffed against the list for the snapshot you want to compare with, but I assume you also want files that have been modified, hence the requirement to parse the above outputs. Unfortunately time does not permit me to come up with a working solution until (really snowed under until mid next week - did someone say there is meant to be a weekend in their too?). But I am sure there is enough info here for someone to hack together a script. Cheers, Darren Mackay -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Best 1.5TB drives for consumer RAID?
Supermicro USAS-L8i controllers. I agree with you, I'd much rather have the drives respond properly and promptly than save a little power if that means I'm going to get strange errors from the array. And these are the "green" drives, they just don't seem to cause me any problems. The issues people have noted with WD have made me stay away from them as just about every drive I own lives in some kind of RAID sometime in its life. I have a couple laptop drives that are single, all desktops have at least a mirror. I'm a little nuts and would probably install mirrors in the laptops if there were somewhere to put them. :) -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Pool disk replacing fails
Hi all, Im trying to replace broken LUN in pool using zpool replace -f , but it fails. Physical disk is already replaced, and new lun has the same address as broken one. But zpool detach/attach works. This is simple configration: pool: mypool state: DEGRADED status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: resilver completed after 0h0m with 0 errors on Thu Feb 4 23:16:21 2010 config: NAMESTATE READ WRITE CKSUM mypool DEGRADED 0 0 0 mirrorDEGRADED 0 0 0 c1t4d0 DEGRADED 0 028 too many errors c1t5d0 ONLINE 0 0 0 c1t4d0 is physically replaced LUN. then I`m trying to replace it in pool. r...@myhost:~# zpool replace -f mypool c1t4d0 invalid vdev specification the following errors must be manually repaired: /dev/dsk/c1t4d0s0 is part of active ZFS pool mypool. Please see zpool(1M). zpool manual says: "-fForces use of new_device, even if its appears to be in use. Not all devices can be overridden in this manner." c1t4d0 in use only in mypool. What is the problem with "zpool replace" in my case? Accordingly to zpool manual it should work. Thanx you ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] unionfs help
On Thu, Feb 04, 2010 at 04:03:19PM -0500, Frank Cusack wrote: > On 2/4/10 2:46 PM -0600 Nicolas Williams wrote: > >In Frank's case, IIUC, the better solution is to avoid the need for > >unionfs in the first place by not placing pkg content in directories > >that one might want to be writable from zones. If there's anything > >about Perl5 (or anything else) that causes this need to arise, then I > >suggest filing a bug. > > Right, and thanks for chiming in. Problem is that perl wants to install > add-on packages in places that the coincide with the system install. > Most stuff is limited to the site_perl directory, which is easily > redirected, but it also has some other locations it likes to meddle with. Maybe we need a zone_perl location. Judicious use of the search paths will get you out of this bind, I think. Nico -- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] unionfs help
On 2/4/10 2:46 PM -0600 Nicolas Williams wrote: In Frank's case, IIUC, the better solution is to avoid the need for unionfs in the first place by not placing pkg content in directories that one might want to be writable from zones. If there's anything about Perl5 (or anything else) that causes this need to arise, then I suggest filing a bug. Right, and thanks for chiming in. Problem is that perl wants to install add-on packages in places that the coincide with the system install. Most stuff is limited to the site_perl directory, which is easily redirected, but it also has some other locations it likes to meddle with. -frank ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] unionfs help
On Thu, Feb 04, 2010 at 03:19:15PM -0500, Frank Cusack wrote: > BTW, I could just install everything in the global zone and use the > default "inheritance" of /usr into each local zone to see the data. > But then my zones are not independent portable entities; they would > depend on some non-default software installed in the global zone. > > Just wanted to explain why this is valuable to me and not just some > crazy way to do something simple. There's no unionfs for Solaris. (For those of you who don't know, unionfs is a BSDism and is a pseudo-filesystem which presents the union of two underlying filesystems, but with all changes being made only to one of the two filesystems. The idea is that one of the underlying filesystems cannot be modified through the union, with all changes made through the union being recorded in an overlay fs. Think, for example, of unionfs- mounting read-only media containing sources: you could cd to the mount point and build the sources, with all intermediate files and results placed in the overlay.) In Frank's case, IIUC, the better solution is to avoid the need for unionfs in the first place by not placing pkg content in directories that one might want to be writable from zones. If there's anything about Perl5 (or anything else) that causes this need to arise, then I suggest filing a bug. Nico -- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to get a list of changed files between two snapshots?
On 2/4/10 8:21 AM -0500 Ross Walker wrote: Find -newer doesn't catch files added or removed it assumes identical trees. This may be redundant in light of my earlier post, but yes it does. Directory mtimes are updated when a file is added or removed, and find -newer will detect that. -frank ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to get a list of changed files between two snapshots?
On 2/4/10 8:00 AM +0100 Tomas Ögren wrote: rsync by default compares metadata first, and only checks through every byte if you add the -c (checksum) flag. I would say rsync is the best tool here. ah, i didn't know that was the default. no wonder recently when i was incremental-rsyncing a few TB of data between 2 hosts (not using zfs) i didn't get any speedup from --size-only or whatever the flag is. The "find -newer blah" suggested in other posts won't catch newer files with an old timestamp (which could happen for various reasons, like being copied with kept timestamps from somewhere else). good point. that is definitely a restriction with find -newer. but if you meet that restriction, and don't need to find added or deleted files, it will be faster since only 1 directory tree has to be walked. but in the general case it does sound like rsync is the best. unless bart can find added and missing files. in which case bart is better because it only has to walk 1 dir tree -- assuming you have a saved manifest from a previous walk over the original dir tree. -frank ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] unionfs help
BTW, I could just install everything in the global zone and use the default "inheritance" of /usr into each local zone to see the data. But then my zones are not independent portable entities; they would depend on some non-default software installed in the global zone. Just wanted to explain why this is valuable to me and not just some crazy way to do something simple. -frank ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to get a list of changed files between two snapshots?
On 2/4/10 12:39 AM -0500 Ross Walker wrote: On Feb 3, 2010, at 8:59 PM, Frank Cusack wrote: I think you misread the thread. Either find or ddiff will do it and either will be better than rsync. Find can find files that have been added or removed between two directory trees? How? When a file is added or removed in a directory, the directory's mtime is updated. So find -newer will locate those directories. Then of course you need to do a little bit more work to locate the files. -frank ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] unionfs help
On February 4, 2010 12:12:04 PM +0100 dick hoogendijk wrote: Why don't you just export that directory with NFS (rw) to your sparse zone and mount it on /usr/perl5/mumble ? Or is this too simple a thought? On February 4, 2010 1:41:20 PM +0100 Thomas Maier-Komor wrote: What about lofs? I thinks lofs is the equivalent for unionfs on Solaris. The problem with both of those solutions is a) writes will overwrite the original filesystem data and b) writes will be visible to everyone else. Neither suggestion provides unionfs capability. On February 4, 2010 12:12:18 PM + Peter Tribble wrote: The way I normally do this is to (in the global zone) symlink /usr/perl5/mumble to somewhere that would be writable such as /opt, and then put what you need into that location in the zone. Leaves a dangling symlink in the global zone and other zones, but that's relatively harmless. The problem with that is you don't see the underlying data that exists in the global zone. I do use that technique for other data (e.g. the entire /usr/local hierarchy), but it doesn't meet my desired needs in this case. I looked into clones (and at least now I understand them much better than before) and they *almost* provide the functionality I want. I could mount a clone in the zoned version of /foo and it would see the original /foo, and changes would go to the clone only, just like a real unionfs. What it's lacking though is that when the underlying filesystem changes (in the global zone), those changes don't percolate up to the clone. The clone's base view of files is from the snapshot it was generated from, which cannot change. It would be great if you could re-target (or re-base?) a clone from a different snapshot than the one it was originally generated from. Since I don't need realtime updates, for my purposes that would be a great equivalent to a true unionfs. So the thread on zfs diff gave me an idea; I will use clones and will write a 'zfs diff'-like tool. When the original /usr/perl5/mumble changes I will use that to pick out files that are different in the clone and populate a new clone with them. -frank ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS compression on Clearcase
On 04/02/2010 12:42, Darren J Moffat wrote: On 04/02/2010 12:13, Roshan Perera wrote: Hi Darren, Thanks - IBM basically haven't test clearcase with ZFS compression therefore, they don't support currently. Future may change, as such my customer cannot use compression. I have asked IBM for roadmap info to find whether/when it will be supported. That is FUD generation in my opinion and being overly cautious. The whole point of the POSIX interfaces to a filesystem is that applications don't actually care how the filesystem stores their data. I agree (*). It is very similar to what EMC did some years ago by officially stating that while ZFS is supported on their disk arrays ZFS snapshots are not. Even more "funny". (*) - however compression is not entirely transparent in such a sense that a reported disk space usage might not be exactly what application expects. But I'm not saying it is an issue here - I honestly don't know. -- Robert Milkowski http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS compression on Clearcase
On 4 Feb 2010, at 16:35, Bob Friesenhahn wrote: > On Thu, 4 Feb 2010, Darren J Moffat wrote: >>> Thanks - IBM basically haven't test clearcase with ZFS compression >>> therefore, they don't support currently. Future may change, as such my >>> customer cannot use compression. I have asked IBM for roadmap info to find >>> whether/when it will be supported. >> >> That is FUD generation in my opinion and being overly cautious. The whole >> point of the POSIX interfaces to a filesystem is that applications don't >> actually care how the filesystem stores their data. > > Clearcase itself implements a versioning filesystem so perhaps it is not > being overly cautious. Compression could change aspects such as how free > space is reported. I'd also like to echo Bob's observations here. Darren's FUDFUD is based on limited experience of ClearCase, I expect ... On the client side, ClearCase actually presnets itself as a mounted filesystem, regardless of what the OS has under the covers. In other words, a ClearCase directory will never be 'ZFS' because it's not ZFS, it's ClearCaseFS. On the server side (which might be the case here) the way ClearCase works is to represent the files and contents in a way more akin to a database (e.g. Oracle) than traditional file-system approaches to data (e.g. CVS, SVN). In much the same way there are app-specific issues with ZFS (e.g. matching block-sizes, dealing with ZFS snapshots on a VM image and so forth) there may well be some with ClearCase. At the very least, though, IBM may just be unable/willing to test it at the time and put their stamp of approval on it. In many cases for IBM products, there are supported platforms (often with specific patch levels), much like there are offically supported Solaris platforms and hot-fixes to go for certain applications. They may well just being cautious in what there is until they've had time to test it out for themselves - or more likely, until the first set of paying customers wants to get invoiced for the investigation. But to claim it's FUD without any real data to back it up is just FUD^2. Alex ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Best 1.5TB drives for consumer RAID?
Hi again, thanks for the answer. Another thing that came to my mind is that you mentioned that you mixed the disks among the controllers. Does that mean you mixed them as well among pools? Unsurprisingly, the WD20EADS is slower than the Hitachi that is a fixed 7200 rpm drive. I wonder what impact that would have if you use them as vdevs of the same pool. Cheers, Tonmaus -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to get a list of changed files between twosnapshots?
>>> Richard Elling 2/3/2010 6:06 PM >>> On Feb 3, 2010, at 3:46 PM, Ross Walker wrote: > On Feb 3, 2010, at 12:35 PM, Frank Cusack > wrote: > > So was there a final consensus on the best way to find the difference between > two snapshots (files/directories added, files/directories deleted and > file/directories changed)? > > Find won't do it, ddiff won't do it, I think the only real option is rsync. > Of course you can zfs send the snap to another system and do the rsync there > against a local previous version. bart(1m) is designed to do this. -- richard Unless something has changed in the past couple months, bart(1m) does not work on large filesystems (2TB limit, I think). http://opensolaris.org/jive/message.jspa?messageID=433896#433896 My solution to this was rsync in dry-run mode between two snapshot directories, which runs in a few seconds and lists both added/changed files and deleted files. http://opensolaris.org/jive/message.jspa?messageID=434176#434176 -Andrew ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] What happens when: file-corrupted and no-redundancy?
On 03/02/2010 21:45, Aleksandr Levchuk wrote: Hardware RAID6 + hot spare, worked well for us. So, I wanted to stick our SAN for data protection. I understand that the end-to-end checks of ZFS make it better at detecting corruptions. In my case, I can imagine that ZFS would FREEZ the whole volume when a single block or file is found to be corrupted. Ideally, I would not like this to happen and instead get a log with names of corrupted files. What exactly does happens when zfs detects a corrupted block/file and does not have redundancy to correct it? Alex I will repeat myself (as I sent below email just yesterday...) ZFS won't freeze a pool if a single block is corrupted even if no redundancy is configured on zfs level. zpool status -v should provide you with list of affected files which you should be able to delete. In case of corrupted block containg meta-data zfs should actually be able to fix it on the fly for you as all meta-data related blocks are kept in at least two copies even if no redundancy is configured at pool level. Let's test it: mi...@r600:~# mkfile 128m file1 mi...@r600:~# zpool create test `pwd`/file1 mi...@r600:~# zpool status test pool: test state: ONLINE scrub: none requested config: NAMESTATE READ WRITE CKSUM testONLINE 0 0 0 /export/home/milek/file1 ONLINE 0 0 0 errors: No known data errors mi...@r600:~# mi...@r600:~# cp /bin/bash /test/file1 mi...@r600:~# cp /bin/bash /test/file2 mi...@r600:~# cp /bin/bash /test/file3 mi...@r600:~# cp /bin/bash /test/file4 mi...@r600:~# cp /bin/bash /test/file5 mi...@r600:~# cp /bin/bash /test/file6 mi...@r600:~# cp /bin/bash /test/file7 mi...@r600:~# cp /bin/bash /test/file8 mi...@r600:~# cp /bin/bash /test/file9 mi...@r600:~# sync mi...@r600:~# dd if=/dev/zero of=file1 seek=50 count=1 conv=notrunc 1+0 records in 1+0 records out 512 bytes (5.1 MB) copied, 0.179617 s, 28.5 MB/s mi...@r600:~# sync mi...@r600:~# zpool scrub test mi...@r600:~# zpool status -v test pool: test state: DEGRADED status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: scrub completed after 0h0m with 7 errors on Thu Feb 4 00:18:40 2010 config: NAMESTATE READ WRITE CKSUM testDEGRADED 0 0 7 /export/home/milek/file1 DEGRADED 0 029 too many errors errors: Permanent errors have been detected in the following files: /test/file1 mi...@r600:~# mi...@r600:~# rm /test/file1 mi...@r600:~# sync mi...@r600:~# zpool scrub test mi...@r600:~# zpool status -v test pool: test state: DEGRADED status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: scrub completed after 0h0m with 0 errors on Thu Feb 4 00:19:55 2010 config: NAMESTATE READ WRITE CKSUM testDEGRADED 0 0 7 /export/home/milek/file1 DEGRADED 0 029 too many errors errors: No known data errors mi...@r600:~# zpool clear test mi...@r600:~# zpool scrub test mi...@r600:~# zpool status -v test pool: test state: ONLINE scrub: scrub completed after 0h0m with 0 errors on Thu Feb 4 00:20:12 2010 config: NAMESTATE READ WRITE CKSUM testONLINE 0 0 0 /export/home/milek/file1 ONLINE 0 0 0 errors: No known data errors mi...@r600:~# mi...@r600:~# ls -la /test/ total 7191 drwxr-xr-x 2 root root 10 2010-02-04 00:19 . drwxr-xr-x 28 root root 30 2010-02-04 00:17 .. -r-xr-xr-x 1 root root 799040 2010-02-04 00:17 file2 -r-xr-xr-x 1 root root 799040 2010-02-04 00:17 file3 -r-xr-xr-x 1 root root 799040 2010-02-04 00:17 file4 -r-xr-xr-x 1 root root 799040 2010-02-04 00:17 file5 -r-xr-xr-x 1 root root 799040 2010-02-04 00:17 file6 -r-xr-xr-x 1 root root 799040 2010-02-04 00:17 file7 -r-xr-xr-x 1 root root 799040 2010-02-04 00:18 file8 -r-xr-xr-x 1 root root 799040 2010-02-04 00:18 file9 mi...@r600:~# -- Robert Milkowski htpp://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] What would happen with a zpool if you 'mirrored' a disk...
On 04/02/2010 13:45, Karl Pielorz wrote: --On 04 February 2010 11:31 + Karl Pielorz wrote: What would happen when I tried to 'online' ad2 again? A reply to my own post... I tried this out, when you make 'ad2' online again, ZFS immediately logs a 'vdev corrupt' failure, and marks 'ad2' (which at this point is a byte-for-byte copy of 'ad1' as it was being written to in background) as 'FAULTED' with 'corrupted data'. You can't "replace" it with itself at that point, but a detach on ad2, and then attaching ad2 back to ad1 results in a resilver, and recovery. So to answer my own question - from my tests it looks like you can do this, and "get away with it". It's probably not ideal, but it does work. it is actually fine - zfs is designed to detect and fix corruption like the one you induced. A safer bet would be to detach the drive from the pool, and then re-attach it (at which point ZFS assumes it's a new drive and probably ignores the 'mirror image' data that's on it). Yes, it should and if you want to force resynchronization that's probably the best way to do it. Other thing is that if you suspect some of your data to be corrupted on a half of mirror you might try to run zpool scrub as it will fix only those corrupted blocks instead of resynchronizing entire mirror which might be faster and safer approach. -- Robert Milkowski http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [ha-clusters-discuss] data corruption
putting storage-discuss@ and zfs-discuss@ as well. On 04/02/2010 16:33, Robert Milkowski wrote: Hi, S10, SC3.2 + patches, Generic_142900-03, 2x T5220 with QLE2462 connected to 6540s. We started to observe below messages yesterday at both nodes at the same time after several weeks of running: XXX cl_runtime: [ID 856360 kern.warning] WARNING: QUORUM_GENERIC: quorum_read_keys error: Reading the registration keys failed on quorum device /dev/did/rdsk/d7s2 with error 22. XXX cl_runtime: [ID 868277 kern.warning] WARNING: CMM: Erstwhile online quorum device /dev/did/rdsk/d7s2 (qid 1) is inaccessible now. d7 is a quorum device and it was marked by cluster as offline: # clq status === Cluster Quorum === --- Quorum Votes Summary from latest node reconfiguration --- Needed Present Possible -- --- 23 3 --- Quorum Votes by Node (current status) --- Node Name Present Possible Status - --- -- XXX 1 1Online YYY 1 1Online --- Quorum Votes by Device (current status) --- Device Name Present Possible Status --- --- -- d701 Offline By looking at the source code I found that the above message is printed from within quorum_device_generic_impl::quorum_read_keys() and it will only happen if quorum_pgre_key_read() returns with return code 22 (actually any other than 0 or EACCESS but we already know that the rc is 22 from the syslog message). Now quorum_pgre_key_read() calls quorum_scsi_sector_read() and passes its return code as its own. The quorum_scsi_sector_read() can possibly return with error if quorum_ioctl_with_retries() return with error or if there is a checksum mismatch. This is the relevant source code: 406 int 407 quorum_scsi_sector_read( [...] 449error = quorum_ioctl_with_retries(vnode_ptr, USCSICMD, (intptr_t)&ucmd, 450&retval); 451if (error != 0) { 452CMM_TRACE(("quorum_scsi_sector_read: ioctl USCSICMD " 453"returned error (%d).\n", error)); 454kmem_free(ucmd.uscsi_rqbuf, (size_t)SENSE_LENGTH); 455return (error); 456} 457 458// 459// Calculate and compare the checksum if check_data is true. 460// Also, validate the pgres_id string at the beg of the sector. 461// 462if (check_data) { 463PGRE_CALCCHKSUM(chksum, sector, iptr); 464 465// Compare the checksum. 466if (PGRE_GETCHKSUM(sector) != chksum) { 467CMM_TRACE(("quorum_scsi_sector_read: " 468"checksum mismatch.\n")); 469kmem_free(ucmd.uscsi_rqbuf, (size_t)SENSE_LENGTH); 470return (EINVAL); 471} 472 473// 474// Validate the PGRE string at the beg of the sector. 475// It should contain PGRE_ID_LEAD_STRING[1|2]. 476// 477if ((os::strncmp((char *)sector->pgres_id, PGRE_ID_LEAD_STRING1, 478strlen(PGRE_ID_LEAD_STRING1)) != 0)&& 479(os::strncmp((char *)sector->pgres_id, PGRE_ID_LEAD_STRING2, 480strlen(PGRE_ID_LEAD_STRING2)) != 0)) { 481CMM_TRACE(("quorum_scsi_sector_read: pgre id " 482"mismatch. The sector id is %s.\n", 483sector->pgres_id)); 484kmem_free(ucmd.uscsi_rqbuf, (size_t)SENSE_LENGTH); 485return (EINVAL); 486} 487 488} 489kmem_free(ucmd.uscsi_rqbuf, (size_t)SENSE_LENGTH); 490 491return (error); 492 } 56 -> __1cXquorum_scsi_sector_read6FpnFvnode_LpnLpgre_sector_b_i_ 6308555744942019 enter 56-> __1cZquorum_ioctl_with_retries6FpnFvnode_ilpi_i_ 6308555744957176 enter 56<- __1cZquorum_ioctl_with_retries6FpnFvnode_ilpi_i_ 6308555745089857 rc: 0 56-> __1cNdbg_print_bufIdbprintf6MpcE_v_ 6308555745108310 enter 56 -> __1cNdbg_print_bufLdbprintf_va6Mbpcrpv_v_ 6308555745120941 enter 56-> __1cCosHsprintf6FpcpkcE_v_ 6308555745134231 enter 56<- __1cCosHsprintf6FpcpkcE_v_ 6308555745148729 rc: 2890607504684 56<- __1cNdbg_print_bufLdbprintf_va6Mbpcrpv_v_ 6308555745162898 rc: 1886718112 56<- __1cNdbg_print_bufIdbprintf6MpcE_v_ 6308555745175529 rc: 1886718112 56<- __1cXquorum_scsi_sector_read6FpnFvnode_LpnLpgre_sector_b_i_ 6308555745188599 rc:
Re: [zfs-discuss] Best 1.5TB drives for consumer RAID?
Le 04/02/10 16:57, Tonmaus a écrit : Hi Arnaud, which type of controller is this? Regards, Tonmaus I use two LSI SAS3081E-R in each server (16 hard disk trays, passive backplane AFAICT, no expander). Works very well. Arnaud ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS compression on Clearcase
On Thu, 4 Feb 2010, Darren J Moffat wrote: Thanks - IBM basically haven't test clearcase with ZFS compression therefore, they don't support currently. Future may change, as such my customer cannot use compression. I have asked IBM for roadmap info to find whether/when it will be supported. That is FUD generation in my opinion and being overly cautious. The whole point of the POSIX interfaces to a filesystem is that applications don't actually care how the filesystem stores their data. Clearcase itself implements a versioning filesystem so perhaps it is not being overly cautious. Compression could change aspects such as how free space is reported. As I recall, Clearcase maintains a database (on top of a filesystem) on a central server to store the actual data. When a user checks out a view of the files, the user views the files via a versioning filesystem, which stores a cache of those file on the local system. Clearcase intruments access to its versioning filesystem so it knows all of the actions which resulted in a built object. This means that there are two places (server and client) where zfs may be involved. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] What would happen with a zpool if you 'mirrored' a disk...
--On 04 February 2010 08:58 -0500 Jacob Ritorto wrote: Seems your controller is actually doing only harm here, or am I missing something? The RAID controller presents the drives as both a mirrored pair, and JBOD - *at the same time*... The machine boots off the partition on the 'mirrored' pair - and ZFS uses the JBOD devices (a different area of, of course). It's a little weird to say the least - and I wouldn't recommend it, but it does work 'for me' - and is a way of getting the system to boot off a mirror, and still be able to use ZFS with only 2 drives available in the chassis. -Karl ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Best 1.5TB drives for consumer RAID?
Hi Arnaud, which type of controller is this? Regards, Tonmaus -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zfs/sol10u8 less stable than in sol10u5?
Hi all, it might not be a ZFS issue (and thus on the wrong list), but maybe there's someone here who might be able to give us a good hint: We are operating 13 x4500 and started to play with non-Sun blessed SSDs in there. As we were running Solaris 10u5 before and wanted to use them as log devices we upgraded to the latest and greatest 10u8 and changed the zpool layout[1]. However, on the first machine we found many, many problems with various disks "failing" in different vdevs (I wrote about this in December on this list IIRC). After going through this with Sun they gave us hints but mostly blamed (maybe rightfully the Intel X25e in there), we considered the 2.5" to 2.5" converter to be at fault as well. Thus we did the next test by placing the SSD into the tray without a conversion unit, but that box (a different one) failed with the same problems. Now, we "learned" from this experience and did the same to another box but without the SSD, i.e. jumpstarted the box and installed 10u8, redid the zpool and started to fill data in. In today's scrub suddenly this happened: s09:~# zpool status pool: atlashome state: DEGRADED status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: resilver in progress for 0h9m, 3.89% done, 4h2m to go config: NAME STATE READ WRITE CKSUM atlashome DEGRADED 0 0 0 raidz1 ONLINE 0 0 0 c0t0d0ONLINE 0 0 0 c1t0d0ONLINE 0 0 0 c4t0d0ONLINE 0 0 0 c6t0d0ONLINE 0 0 0 c7t0d0ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c0t1d0ONLINE 0 0 0 c1t1d0ONLINE 0 0 0 c4t1d0ONLINE 0 0 0 c5t1d0ONLINE 0 0 0 c6t1d0ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c7t1d0ONLINE 0 0 1 c0t2d0ONLINE 0 0 0 c1t2d0ONLINE 0 0 2 c4t2d0ONLINE 0 0 0 c5t2d0ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c6t2d0ONLINE 0 0 0 c7t2d0ONLINE 0 0 0 c0t3d0ONLINE 0 0 0 c1t3d0ONLINE 0 0 0 c4t3d0ONLINE 0 0 0 raidz1 DEGRADED 0 0 0 c5t3d0ONLINE 0 0 0 c6t3d0ONLINE 0 0 0 c7t3d0ONLINE 0 0 0 c1t4d0ONLINE 0 0 1 spare DEGRADED 0 0 0 c4t4d0 DEGRADED 5 011 too many errors c0t4d0 ONLINE 0 0 0 5.38G resilvered raidz1 ONLINE 0 0 0 c5t4d0ONLINE 0 0 0 c6t4d0ONLINE 0 0 0 c7t4d0ONLINE 0 0 0 c0t5d0ONLINE 0 0 0 c1t5d0ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c4t5d0ONLINE 0 0 0 c5t5d0ONLINE 0 0 0 c6t5d0ONLINE 0 0 0 c7t5d0ONLINE 0 0 0 c0t6d0ONLINE 0 0 1 raidz1 ONLINE 0 0 0 c1t6d0ONLINE 0 0 0 c4t6d0ONLINE 0 0 0 c5t6d0ONLINE 0 0 0 c6t6d0ONLINE 0 0 0 c7t6d0ONLINE 0 0 1 raidz1 ONLINE 0 0 0 c0t7d0ONLINE 0 0 0 c1t7d0ONLINE 0 0 0 c4t7d0ONLINE 0 0 0 c5t7d0ONLINE 0 0 0 c6t7d0ONLINE 0 0 0 spares c0t4d0 INUSE currently in use c7t7d0 AVAIL Also similar to the other hosts were the much, much higher Soft/Hard error count in iostat: s09:~# iostat -En|grep Soft c2t0d0 Soft Errors: 1 Hard Errors: 2 Transport
Re: [zfs-discuss] What would happen with a zpool if you 'mirrored' a disk...
On Thu, 4 Feb 2010, Karl Pielorz wrote: The reason for testing this is because of a weird RAID setup I have where if 'ad2' fails, and gets replaced - the RAID controller is going to mirror 'ad1' over to 'ad2' - and cannot be stopped. Does the raid controller not support a JBOD mode? Regards, markm ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Best 1.5TB drives for consumer RAID?
I think you'll do just fine then. And I think the extra platter will work to your advantage. -marc On 2/3/10, Simon Breden wrote: > Probably 6 in a RAID-Z2 vdev. > > Cheers, > Simon > -- > This message posted from opensolaris.org > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > -- Sent from my mobile device ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] What would happen with a zpool if you 'mirrored' a disk...
Seems your controller is actually doing only harm here, or am I missing something? On Feb 4, 2010 8:46 AM, "Karl Pielorz" wrote: --On 04 February 2010 11:31 + Karl Pielorz wrote: > What would happen... A reply to my own post... I tried this out, when you make 'ad2' online again, ZFS immediately logs a 'vdev corrupt' failure, and marks 'ad2' (which at this point is a byte-for-byte copy of 'ad1' as it was being written to in background) as 'FAULTED' with 'corrupted data'. You can't "replace" it with itself at that point, but a detach on ad2, and then attaching ad2 back to ad1 results in a resilver, and recovery. So to answer my own question - from my tests it looks like you can do this, and "get away with it". It's probably not ideal, but it does work. A safer bet would be to detach the drive from the pool, and then re-attach it (at which point ZFS assumes it's a new drive and probably ignores the 'mirror image' data that's on it). -Karl (The reason for testing this is because of a weird RAID setup I have where if 'ad2' fails, and gets replaced - the RAID controller is going to mirror 'ad1' over to 'ad2' - and cannot be stopped. However, once the re-mirroring is complete the RAID controller steps out the way, and allows raw access to each disk in the mirror. Strange, a long story - but true). ___ zfs-discuss mailing list zfs-disc...@opensolaris.or... ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Booting OpenSolaris on ZFS root on Sun Netra 240
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi, I'm kind stuck at trying to get my aging Netra 240 machine to boot OpenSolaris. The live CD and installation worked perfectly, but when I reboot and try to boot from the installed disk, I get: Rebooting with command: boot disk0 Boot device: /p...@1c,60/s...@2/d...@0,0 File and args: | The file just loaded does not appear to be executable. I suspect it's due to the fact that my OBP can't boot a ZFS root (OpenBoot 4.22.19). Is there a to work around this? Regards, - -- Saso -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAktqz7kACgkQRO8UcfzpOHCqhgCgl8I+5zCTBLb0MUVq9cz5zrqz 9LgAoIurhee3/+nfXtUBwVczkjKxQVaj =7dXF -END PGP SIGNATURE- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] What would happen with a zpool if you 'mirrored' a disk...
--On 04 February 2010 11:31 + Karl Pielorz wrote: What would happen when I tried to 'online' ad2 again? A reply to my own post... I tried this out, when you make 'ad2' online again, ZFS immediately logs a 'vdev corrupt' failure, and marks 'ad2' (which at this point is a byte-for-byte copy of 'ad1' as it was being written to in background) as 'FAULTED' with 'corrupted data'. You can't "replace" it with itself at that point, but a detach on ad2, and then attaching ad2 back to ad1 results in a resilver, and recovery. So to answer my own question - from my tests it looks like you can do this, and "get away with it". It's probably not ideal, but it does work. A safer bet would be to detach the drive from the pool, and then re-attach it (at which point ZFS assumes it's a new drive and probably ignores the 'mirror image' data that's on it). -Karl (The reason for testing this is because of a weird RAID setup I have where if 'ad2' fails, and gets replaced - the RAID controller is going to mirror 'ad1' over to 'ad2' - and cannot be stopped. However, once the re-mirroring is complete the RAID controller steps out the way, and allows raw access to each disk in the mirror. Strange, a long story - but true). ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to get a list of changed files between two snapshots?
The delete queue and related blocks need further investigation... r...@osol-dev:/data/zdb-test# zdb -dd data/zdb-test | more Dataset data/zdb-test [ZPL], ID 641, cr_txg 529804, 24.5K, 6 objects Object lvl iblk dblk dsize lsize %full type 0716K16K 15.0K16K 18.75 DMU dnode -1116K512 1K512 100.00 ZFS user/group used -2116K512 1K512 100.00 ZFS user/group used 1116K512 1K512 100.00 ZFS master node 2116K512 1K512 100.00 ZFS delete queue 3116K 1.50K 1K 1.50K 100.00 ZFS directory 4116K512 1K512 100.00 ZFS directory 19116K512512512 100.00 ZFS plain file 22116K 2K 2K 2K 100.00 ZFS plain file all the info seems to be there (otherwise, we would not be able to store files at all!!). and *spare time* project for the coming couple of weeks... Darren -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to get a list of changed files between two snapshots?
Interesting, can you explain what zdb is dumping exactly? I suppose you would be looking for blocks referenced in the snapshot that have a single reference and print out the associated file/ directory name? -Ross On Feb 4, 2010, at 7:29 AM, Darren Mackay wrote: Hi Ross, zdb - f...@snapshot | grep "path" | nawk '{print $2}' Enjoy! Darren Mackay -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to get a list of changed files between two snapshots?
On Feb 4, 2010, at 2:00 AM, Tomas Ögren wrote: On 03 February, 2010 - Frank Cusack sent me these 0,7K bytes: On February 3, 2010 12:04:07 PM +0200 Henu wrote: Is there a possibility to get a list of changed files between two snapshots? Currently I do this manually, using basic file system functions offered by OS. I scan every byte in every file manually and it ^^^ On February 3, 2010 10:11:01 AM -0500 Ross Walker > wrote: Not a ZFS method, but you could use rsync with the dry run option to list all changed files between two file systems. That's exactly what the OP is already doing ... rsync by default compares metadata first, and only checks through every byte if you add the -c (checksum) flag. I would say rsync is the best tool here. The "find -newer blah" suggested in other posts won't catch newer files with an old timestamp (which could happen for various reasons, like being copied with kept timestamps from somewhere else). Find -newer doesn't catch files added or removed it assumes identical trees. I would be interested in comparing ddiff, bart and rsync (local comparison only) to see imperically how they match up. -Ross ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to get a list of changed files between two snapshots?
looking through some more code.. i was a bit premature in my last post - been a long day. extracting the guids and query the metadata seems to be logical -> i think runnign a zfs send just to parse the data stream is a lot of overhead, when you really only need to traverse metadata directly. zdb sources have most of the bits there - just need to unwind the deadlist (this seems to match the numder of blocks that have been deleted since the last snap)... might look into this in the next week or 2 if i have time -> seems like a worthwhile project ;-) Darren Mackay -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] What happens when: file-corrupted and no-redundancy?
Hardware RAID6 + hot spare, worked well for us. So, I wanted to stick our SAN for data protection. I understand that the end-to-end checks of ZFS make it better at detecting corruptions. In my case, I can imagine that ZFS would FREEZ the whole volume when a single block or file is found to be corrupted. Ideally, I would not like this to happen and instead get a log with names of corrupted files. What exactly does happens when zfs detects a corrupted block/file and does not have redundancy to correct it? Alex -- --- Aleksandr Levchuk Homepage: http://biocluster.ucr.edu/~alevchuk/ Cell Phone: (951) 368-0004 Bioinformatic Systems and Databases Lab Phone: (951) 905-5232 Institute for Integrative Genome Biology University of California, Riverside --- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS compression on Clearcase
Hi Darren, I totally agree with you and have raised some of the points mentioned but you have given even more items to pass on. I will update the alias when I hear further. Many Thanks Roshan - Original Message - From: Darren J Moffat Date: Thursday, February 4, 2010 12:42 pm Subject: Re: [zfs-discuss] ZFS compression on Clearcase To: Roshan Perera Cc: zfs-discuss@opensolaris.org > On 04/02/2010 12:13, Roshan Perera wrote: > >Hi Darren, > > > >Thanks - IBM basically haven't test clearcase with ZFS compression > therefore, they don't support currently. Future may change, as such my > customer cannot use compression. I have asked IBM for roadmap info to > find whether/when it will be supported. > > That is FUD generation in my opinion and being overly cautious. The > whole point of the POSIX interfaces to a filesystem is that > applications don't actually care how the filesystem stores their data. > > UFS never had checksums before but ZFS adds those, but that didn't > mean that applications had to be checked because checksums were now > done on the data. > > What if it was the disk drive that was doing the compression ? There > would be similarly no way for the application to actually know that it > is happening. > > What about every other feature we add to ZFS ? Like dedup (which is > a type of compression) - again they app can't tell. Or snapshots - > the app can't tell. > > Thats my opinion though and I know that ISVs can be very cautious > about new features sometimes and overly so when it is far below their > parts of the stack. > > Taking another example it would be like an ISV that supports their > application running over NFS saying they don't support a certain type > of vendors switch in the network because they haven't tested it. > > -- > Darren J Moffat > ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] verging OT: how to buy J4500 w/o overpriced drives
On Wed, Feb 03, 2010 at 03:02:21PM -0800, Brandon High wrote: > Another solution, for a true DIY x4500: BackBlaze has schematics for > the 45 drive chassis that they designed available on their website. > http://blog.backblaze.com/2009/09/01/petabytes-on-a-budget-how-to-build-cheap-cloud-storage/ > > Someone brought it up on the list a few months ago (which is how I > know about it) and there was some interesting discussion at that time. IIRC the consensus was that the vibration dampening was inadequate and the interfaces oversubscribed and the disks being not nearline too unreliable, but I might be misremembering. I'm still happy with my 16x WD RE4 drives (linux mdraid RAID 10, CentOS, Oracle, no zfs). Supermicro does 36x drive chassis now http://www.supermicro.com/products/chassis/4U/?chs=847 so budget DIY for zfs is about 72 TByte raw storage with 2 TByte nearline SATA drives. I've had trouble finding internal 2x 2.5" in one 3.5" SSD mounts from Supermicro for hybrid zfs, but no doubt one could improvise something from the usual ricer supplies. On smaller scale http://www.supermicro.com/products/chassis/2U/?chs=216 works well with 2.5" Intel SSDs and VelociRaptors. I hope to be able to use one for a hybrid zfs iSCSI target for VMWare, probably with 10 GBit Ethernet. > There's no way I would use something like this for most installs, but > there is definitely some use. Now that opensolaris supports sata pmp, > you could use a similar chassis for a zfs pool. -- Eugen* Leitl http://leitl.org";>leitl http://leitl.org __ ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org 8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS compression on Clearcase
On 04/02/2010 12:13, Roshan Perera wrote: Hi Darren, Thanks - IBM basically haven't test clearcase with ZFS compression therefore, they don't support currently. Future may change, as such my customer cannot use compression. I have asked IBM for roadmap info to find whether/when it will be supported. That is FUD generation in my opinion and being overly cautious. The whole point of the POSIX interfaces to a filesystem is that applications don't actually care how the filesystem stores their data. UFS never had checksums before but ZFS adds those, but that didn't mean that applications had to be checked because checksums were now done on the data. What if it was the disk drive that was doing the compression ? There would be similarly no way for the application to actually know that it is happening. What about every other feature we add to ZFS ? Like dedup (which is a type of compression) - again they app can't tell. Or snapshots - the app can't tell. Thats my opinion though and I know that ISVs can be very cautious about new features sometimes and overly so when it is far below their parts of the stack. Taking another example it would be like an ISV that supports their application running over NFS saying they don't support a certain type of vendors switch in the network because they haven't tested it. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] unionfs help
On 04.02.2010 12:12, dick hoogendijk wrote: > > Frank Cusack wrote: >> Is it possible to emulate a unionfs with zfs and zones somehow? My zones >> are sparse zones and I want to make part of /usr writable within a zone. >> (/usr/perl5/mumble to be exact) > > Why don't you just export that directory with NFS (rw) to your sparse zone > and mount it on /usr/perl5/mumble ? Or is this too simple a thought? > What about lofs? I thinks lofs is the equivalent for unionfs on Solaris. E.g. mount -F lofs /originial/path /my/alternate/mount/point - Thomas ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to get a list of changed files between two snapshots?
Hi Ross, zdb - f...@snapshot | grep "path" | nawk '{print $2}' Enjoy! Darren Mackay -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS compression on Clearcase
Hi Darren, Thanks - IBM basically haven't test clearcase with ZFS compression therefore, they don't support currently. Future may change, as such my customer cannot use compression. I have asked IBM for roadmap info to find whether/when it will be supported. Thanks Roshan - Original Message - From: Darren J Moffat Date: Thursday, February 4, 2010 11:59 am Subject: Re: [zfs-discuss] ZFS compression on Clearcase To: Roshan Perera Cc: zfs-discuss@opensolaris.org > On 04/02/2010 11:54, Roshan Perera wrote: > >Anyone in the group using ZFS compression on clearcase vobs? If so > any issues, gotchas? > > There shouldn't be any issues and I'd be very surprised if there was. > > >IBM support informs that ZFS compression is not supported. Any views > on this? > > Need more data on why the claim it isn't supported - what issue have > they seen or do they thing there could be. I see no reason that ZFS > compression wouldn't be supported, in fact Clearcase shouldn't even be > able to tell. > > Compression in ZFS is completely below the POSIX filesystem layer and > completely out of the control of any application or even kernel > service like NFS or CIFS that just uses POSIX interfaces. Same is > true of deduplication and will be true of encryption when it > integrates as well. > > -- > Darren J Moffat > ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] unionfs help
On Thu, Feb 4, 2010 at 2:09 AM, Frank Cusack wrote: > Is it possible to emulate a unionfs with zfs and zones somehow? My zones > are sparse zones and I want to make part of /usr writable within a zone. > (/usr/perl5/mumble to be exact) > > I can't just mount a writable directory on top of /usr/perl5 because then > it hides all the stuff in the global zone. I could repopulate it in the > local zone but ugh that is unattractive. I'm hoping for a better way. > Creating a full zone is not an option for me. > > I don't think this is possible but maybe someone else knows better. I > was thinking something with snapshots and clones? The way I normally do this is to (in the global zone) symlink /usr/perl5/mumble to somewhere that would be writable such as /opt, and then put what you need into that location in the zone. Leaves a dangling symlink in the global zone and other zones, but that's relatively harmless. -- -Peter Tribble http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS compression on Clearcase
On 04/02/2010 11:54, Roshan Perera wrote: Anyone in the group using ZFS compression on clearcase vobs? If so any issues, gotchas? There shouldn't be any issues and I'd be very surprised if there was. IBM support informs that ZFS compression is not supported. Any views on this? Need more data on why the claim it isn't supported - what issue have they seen or do they thing there could be. I see no reason that ZFS compression wouldn't be supported, in fact Clearcase shouldn't even be able to tell. Compression in ZFS is completely below the POSIX filesystem layer and completely out of the control of any application or even kernel service like NFS or CIFS that just uses POSIX interfaces. Same is true of deduplication and will be true of encryption when it integrates as well. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS compression on Clearcase
Hi All, Anyone in the group using ZFS compression on clearcase vobs? If so any issues, gotchas? IBM support informs that ZFS compression is not supported. Any views on this? Rgds Roshan ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] What would happen with a zpool if you 'mirrored' a disk...
Hi All, I've been using ZFS for a while now - and everything's been going well. I use it under FreeBSD - but this question almost certainly should be the same answer, whether it's FreeBSD or Solaris (I think/hope :)... Imagine if I have a zpool with 2 disks in it, that are mirrored: " NAME STATE READ WRITE CKSUM vol ONLINE 0 0 0 mirrorONLINE 0 0 0 ad1 ONLINE 0 0 0 ad2 ONLINE 0 0 0 " (The device names are FreeBSD disks) If I offline 'ad2' - and then did: " dd if=/dev/ad1 of=/dev/ad2 " (i.e. make a mirror copy of ad1 to ad2 - on a *running* system). What would happen when I tried to 'online' ad2 again? I fully expect it might not be pleasant... I'm just curious as to what's going to happen. When I 'online' ad2 will ZFS look at it, and be clever enough to figure out the disk is obviously corrupt/unusable/has bad meta data on it - and resilver accordingly? Or is it going to see what it thinks is another 'ad1' and get a little upset? I'm trying to setup something here so I can test what happens - I just thought I'd ask around a bit to see if anyone knows what'll happen from past experience. Thanks, -Karl ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] unionfs help
Frank Cusack wrote: > Is it possible to emulate a unionfs with zfs and zones somehow? My zones > are sparse zones and I want to make part of /usr writable within a zone. > (/usr/perl5/mumble to be exact) Why don't you just export that directory with NFS (rw) to your sparse zone and mount it on /usr/perl5/mumble ? Or is this too simple a thought? -- Dick Hoogendijk -- PGP/GnuPG key: F86289CE ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to get a list of changed files between two snapshots?
Henu wrote: So do you mean I cannot gather the names and locations of changed/created/removed files just by analyzing a stream of (incremental) zfs_send? That's correct, you can't. Snapshots do not work at the file level. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to get a list of changed files between two snapshots?
Whoa! That is exactly what I've been looking for. Is there any developement version publicly available for testing? Regards, Henrik Heino Quoting Matthew Ahrens : This is RFE 6425091 "want 'zfs diff' to list files that have changed between snapshots", which covers both file & directory changes, and file removal/creation/renaming. We actually have a prototype of zfs diff. Hopefully someday we will finish it up... --matt ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to get a list of changed files between two snapshots?
So do you mean I cannot gather the names and locations of changed/created/removed files just by analyzing a stream of (incremental) zfs_send? Quoting Andrey Kuzmin : On Wed, Feb 3, 2010 at 6:11 PM, Ross Walker wrote: On Feb 3, 2010, at 9:53 AM, Henu wrote: Okay, so first of all, it's true that send is always fast and 100% reliable because it uses blocks to see differences. Good, and thanks for this information. If everything else fails, I can parse the information I want from send stream :) But am I right, that there is no other methods to get the list of changed files other than the send command? At zfs_send level there are no files, just DMU objects (modified in some txg which is the basis for changed/unchanged decision). ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Workaround for mpt timeouts in snv_127
Hi Simon > I.e. you'll have to manually intervene > if a consumer drive causes the system to hang, and > replace it, whereas the RAID edition drives will > probably report the error quickly and then ZFS will > rewrite the data elsewhere, and thus maybe not kick > the drive. IMHO the relevant aspects are if ZFS is able to give accurate account on cache flush status and even realize if a drive is not responsive. That being said, I have no seen a specific report that ZFS would kick green drives at random or at pattern, like the poor SoHo storage enclosure users do all the time. > > So it sounds preferable to have TLER in operation, if > one can find a consumer-priced drive that allows it, > or just take the hit and go with whatever non-TLER > drive you choose and expect to have to manually > intervene if a drive plays up. OK for home user where > he is not too affected, but not good for businesses > which need to have something recovered quickly. One point about TLER is that two error correction schemes concur in the case you run a consumer drive on an active RAID controller that has its own mechanisms. When you run ZFS on a RAID controller in contrast to the best practise recommendations, an analogue question arises. On the other hand, if you run a green consumer drive on a dumb HBA , I wouldn't know what is wrong with it in the first place. As much as for manual interventions, the only one I am aware of would be to re-attach a single drive. Not an option if you are really affected like those miserable Thecus N7000 users that see the entire array of only a handful of drives drop out within hours - over and over again, or not even get to finish formatting the stripe set. The dire consequences of the gossiped TLER problems let me believe that there would be much more and quite specific reports in this place if this was a systematic issue with ZFS. Other than that, we are operating outside supported specs when running consumer level drives in large arrays. So far at least the perspective of Seagate and WD. > > > That all rather points to singular issues with > > firmware bugs or similar than to a systematic > issue, > > doesn't it? > > I'm not sure. Some people in the WDC threads seem to > report problems with pauses during media streaming > etc. This was again for SoHo storage enclosures - not for ZFS, right? > when the > 32MB+ cache is empty, then it loads another 32MB into > cache etc and so on? I am not sure if any current disk will have such a simplistic cache management that will draw upon completely cycling the buffer content, let alone for reads that belong to a single file (a disk basically is agnostic of files). Moreover, such a buffer management would be completely useless for a striped array. I don't know much better what a disk cache does either, but I am afraid that direction is probably not helpful to understanding certain phenomenons people have reported. I think that at this time we are seeing a quite large amount of evolutions going on in disk storage, whereas many established assumptions are being abandoned while backwards compatibility is not always taken care of. SAS 6G (will my controller really work in a PCIe 1.1 slot?) and 4k clusters are certainly only prominent examples. It's probably even more true than ever to fall back to established technologies in such times, including of biting the bullet of cost premium on occasion. Best regards Tonmaus -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Large scale ZFS deployments out there (>200 disks)
We got 50+ X4500/X4540's running in the same DC happiliy with ZFS. Approximately 2500 drives and growing everyday... Br Mertol Mertol Ozyoney Storage Practice - Sales Manager Sun Microsystems, TR Istanbul TR Phone +902123352200 Mobile +905339310752 Fax +90212335 Email mertol.ozyo...@sun.com -Original Message- From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Henrik Johansen Sent: Friday, January 29, 2010 10:45 AM To: zfs-discuss@opensolaris.org Subject: Re: [zfs-discuss] Large scale ZFS deployments out there (>200 disks) On 01/28/10 11:13 PM, Lutz Schumann wrote: > While thinking about ZFS as the next generation filesystem without > limits I am wondering if the real world is ready for this kind of > incredible technology ... > > I'm actually speaking of hardware :) > > ZFS can handle a lot of devices. Once in the import bug > (http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6761786) > is fixed it should be able to handle a lot of disks. That was fixed in build 125. > I want to ask the ZFS community and users what large scale deploments > are out there. How man disks ? How much capacity ? Single pool or > many pools on a server ? How does resilver work in those > environtments ? How to you backup ? What is the experience so far ? > Major headakes ? > > It would be great if large scale users would share their setups and > experiences with ZFS. The largest ZFS deployment that we have is currently comprised of 22 Dell MD1000 enclosures (330 750 GB Nearline SAS disks). We have 3 head nodes and use one zpool per node, comprised of rather narrow (5+2) RAIDZ2 vdevs. This setup is exclusively used for storing backup data. Resilver times could be better - I am sure that this will improve once we upgrade from S10u9 to 2010.03. One of the things that I am missing in ZFS is the ability to prioritize background operations like scrub and resilver. All our disks are idle during daytime and I would love to be able to take advantage of this, especially during resilver operations. This setup has been running for about a year with no major issues so far. The only hickups we've had were all HW related (no fun in firmware upgrading 200+ disks). > Will you ? :) Thanks, Robert -- Med venlig hilsen / Best Regards Henrik Johansen hen...@scannet.dk Tlf. 75 53 35 00 ScanNet Group A/S ScanNet ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedup memory overhead
Sorry fort he late answer. Approximately it's 150 bytes per individual block. So increasing the blocksize is a good idea. Also when L1 and L2 arc is not enough system will start making disk IOPS and RaidZ is not very effective for random IOPS and it's likely that when your dram is not enough your perfor ance will suffer. You may choose to use Raid 10 which is a lot better on random loads Mertol Mertol Ozyoney Storage Practice - Sales Manager Sun Microsystems, TR Istanbul TR Phone +902123352200 Mobile +905339310752 Fax +90212335 Email mertol.ozyo...@sun.com -Original Message- From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of erik.ableson Sent: Thursday, January 21, 2010 6:05 PM To: zfs-discuss Subject: [zfs-discuss] Dedup memory overhead Hi all, I'm going to be trying out some tests using b130 for dedup on a server with about 1,7Tb of useable storage (14x146 in two raidz vdevs of 7 disks). What I'm trying to get a handle on is how to estimate the memory overhead required for dedup on that amount of storage. From what I gather, the dedup hash keys are held in ARC and L2ARC and as such are in competition for the available memory. So the question is how much memory or L2ARC would be necessary to ensure that I'm never going back to disk to read out the hash keys. Better yet would be some kind of algorithm for calculating the overhead. eg - averaged block size of 4K = a hash key for every 4k stored and a hash occupies 256 bits. An associated question is then how does the ARC handle competition between hash keys and regular ARC functions? Based on these estimations, I think that I should be able to calculate the following: 1,7 TB 1740,8 GB 1782579,2 MB 1825361100,8KB 4 average block size 456340275,2 blocks 256 hash key size-bits 1,16823E+11 hash key overhead - bits 1460206,4 hash key size-bytes 14260633,6 hash key size-KB 13926,4 hash key size-MB 13,6hash key overhead-GB Of course the big question on this will be the average block size - or better yet - to be able to analyze an existing datastore to see just how many blocks it uses and what is the current distribution of different block sizes. I'm currently playing around with zdb with mixed success on extracting this kind of data. That's also a worst case scenario since it's counting really small blocks and using 100% of available storage - highly unlikely. # zdb -ddbb siovale/iphone Dataset siovale/iphone [ZPL], ID 2381, cr_txg 3764691, 44.6G, 99 objects ZIL header: claim_txg 0, claim_blk_seq 0, claim_lr_seq 0 replay_seq 0, flags 0x0 Object lvl iblk dblk dsize lsize %full type 0716K16K 57.0K64K 77.34 DMU dnode 1116K 1K 1.50K 1K 100.00 ZFS master node 2116K512 1.50K512 100.00 ZFS delete queue 3216K16K 18.0K32K 100.00 ZFS directory 4316K 128K 408M 408M 100.00 ZFS plain file 5116K16K 3.00K16K 100.00 FUID table 6116K 4K 4.50K 4K 100.00 ZFS plain file 7116K 6.50K 6.50K 6.50K 100.00 ZFS plain file 8316K 128K 952M 952M 100.00 ZFS plain file 9316K 128K 912M 912M 100.00 ZFS plain file 10316K 128K 695M 695M 100.00 ZFS plain file 11316K 128K 914M 914M 100.00 ZFS plain file Now, if I'm understanding this output properly, object 4 is composed of 128KB blocks with a total size of 408MB, meaning that it uses 3264 blocks. Can someone confirm (or correct) that assumption? Also, I note that each object (as far as my limited testing has shown) has a single block size with no internal variation. Interestingly, all of my zvols seem to use fixed size blocks - that is, there is no variation in the block sizes - they're all the size defined on creation with no dynamic block sizes being used. I previously thought that the -b option set the maximum size, rather than fixing all blocks. Learned something today :-) # zdb -ddbb siovale/testvol Dataset siovale/testvol [ZVOL], ID 45, cr_txg 4717890, 23.9K, 2 objects Object lvl iblk dblk dsize lsize %full type 0716K16K 21.0K16K6.25 DMU dnode 1116K64K 064K0.00 zvol object 2116K512 1.50K512 100.00 zvol prop # zdb -ddbb siovale/tm-media Dataset siovale/tm-media [ZVOL], ID 706, cr_txg 4426997, 240G, 2 objects ZIL header: claim_txg 0, claim_blk_seq 0, claim_lr_seq 0 replay_seq 0, flags 0x0 Object lvl iblk dblk dsize lsize %full type 0716K16K 21.0K16K6.25 DMU dnode 1516K 8K 240G 250G 97.33 zvol object 2116K512 1.50K512 100.00 zvol prop ___