Re: [zfs-discuss] ZFS send and receive corruption across a WAN link?
On Fri, Mar 19, 2010 at 12:38 PM, Rob wrote: > Can a ZFS send stream become corrupt when piped between two hosts across a > WAN link using 'ssh'? unless the end computers are bad (memory problems, etc.), then the answer should be no. ssh has its own error detection method, and the zfs send stream itself is checksummed. -- Fajar ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS send and receive corruption across a WAN link?
Can a ZFS send stream become corrupt when piped between two hosts across a WAN link using 'ssh'? For example a host in Australia sends a stream to a host in the UK as follows: # zfs send tank/f...@now | ssh host.uk receive tank/bar -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to manage scrub priority or defer scrub?
On Thu, Mar 18, 2010 at 09:54:28PM -0700, Tonmaus wrote: > > (and the details of how much and how low have changed a few times > > along the version trail). > > Is there any documentation about this, besides source code? There are change logs and release notes, and random blog postings along the way - they're less structured but often more informative. There were some good descriptions about the scrub improvements 6-12 months ago. The bugid's listed in change logs that mention scrub should be pretty simple to find and sequence with versions. > > However, that prioritisation applies only within the kernel; sata > > disks don't understand the prioritisation, so once the requests > > are with the disk they can still saturate out other IOs that made > > it to the front of the kernel's queue faster. > > I am not sure what you are hinting at. I initially thought about TCQ > vs. NCQ when I read this. But I am not sure which detail of TCQ > would allow for I/O discrimination that NCQ doesn't have. Er, the point was exactly that there is no discrimination, once the request is handed to the disk. If the internal-to-disk queue is enough to keep the heads saturated / seek bound, then a new high-priority-in-the-kernel request will get to the disk sooner, but may languish once there. You'll get best overall disk throughput by letting the disk firmware optimise seeks, but your priority request won't get any further preference. Shortening the list of requests handed to the disk in parallel may help, and still keep the channel mostly busy, perhaps at the expense of some extra seek length and lower overall throughput. You can shorten the number of outstanding IO's per vdev for the pool overall, or preferably the number scrub will generate (to avoid penalising all IO). The tunables for each of these should be found readily, probably in the Evil Tuning Guide. > All I know about command cueing is that it is about optimising DMA > strategies and optimising the handling of the I/O requests currently > issued in respect to what to do first to return all data in the > least possible time. (??) Mostly, as above it's about giving the disk controller more than one thing to work on at a time, and having the issuance of a request and its completion overlap with others, so the head movement can be optimised and the controller channel can be busy with data transfer for another while seeking. Disks with write cache effectively do this for writes, by pretending they complete immediately, but reads would block the channel until satisfied. (This is all for ATA which lacked this, before NCQ. SCSI has had these capabilities for a long time). > > If you're looking for something to tune, you may want to look at > > limiting the number of concurrent IO's handed to the disk to try > > and avoid saturating the heads. > > Indeed, that was what I had in mind. With the addition that I think > it is as well necessary to avoid saturating other components, such > as CPU. Less important, since prioritisation can be applied there too, but potentially also an issue. Perhaps you want to keep the cpu fan speed/noise down for a home server, even if the scrub runs longer. > I have two systems here, a production system that is on LSI SAS > (mpt) controllers, and another one that is on ICH-9 (ahci). Disks > are SATA-2. The plan was that this combo will have NCQ support. On > the other hand, do you know if there a method to verify if its > functioning? AHCI should be fine. In practice if you see actv > 1 (with a small margin for sampling error) then ncq is working. -- Dan. pgpIQ2VrNVyJl.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to manage scrub priority or defer scrub?
Hello Dan, Thank you very much for this interesting reply. > roughly speaking, reading through the filesystem does > the least work > possible to return the data. A scrub does the most > work possible to > check the disks (and returns none of the data). Thanks for the clarification. That's what I had thought. > > For the OP: scrub issues low-priority IO (and the > details of how much > and how low have changed a few times along the > version trail). Is there any documentation about this, besides source code? > However, that prioritisation applies only within the > kernel; sata disks > don't understand the prioritisation, so once the > requests are with the > disk they can still saturate out other IOs that made > it to the front > of the kernel's queue faster. I am not sure what you are hinting at. I initially thought about TCQ vs. NCQ when I read this. But I am not sure which detail of TCQ would allow for I/O discrimination that NCQ doesn't have. All I know about command cueing is that it is about optimising DMA strategies and optimising the handling of the I/O requests currently issued in respect to what to do first to return all data in the least possible time. (??) > If you're looking for > something to > tune, you may want to look at limiting the number of > concurrent IO's > handed to the disk to try and avoid saturating the > heads. Indeed, that was what I had in mind. With the addition that I think it is as well necessary to avoid saturating other components, such as CPU. > > You also want to confirm that your disks are on an > NCQ-capable > controller (eg sata rather than cmdk) otherwise they > will be severely > limited to processing one request at a time, at least > for reads if you > have write-cache on (they will be saturated at the > stop-and-wait > channel, long before the heads). I have two systems here, a production system that is on LSI SAS (mpt) controllers, and another one that is on ICH-9 (ahci). Disks are SATA-2. The plan was that this combo will have NCQ support. On the other hand, do you know if there a method to verify if its functioning? Best regards, Tonmaus -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies
> > From what I've read so far, zfs send is a block level api and thus > > cannot be > > used for real backups. As a result of being block level oriented, the > > Weirdo. The above "cannot be used for real backups" is obviously > subjective, is incorrect and widely discussed here, so I just say > "weirdo." > I'm tired of correcting this constantly. I apologize if I was insulting, and it's clear that I was. Seriously, I apologize. I should have thought about that more before I sent it, and I should have been more considerate. To clarify, more accurately, from a technical standpoint, what I meant: There are circumstances, such as backup to removable disks, or time-critical incremental data streams, where the performance of incremental "zfs send" versus the performance of star, rsync, or any other file-based backup mechanism, "zfs send" is the clear winner ... There are circumstances where zfs send is enormously a winner. There are other circumstances, such as writing to tape, where star, or tar, or in other circumstances, where rsync or other tools may be the winner... And I don't claim to know all the circumstances where something else beats "zfs send." There probably are many circumstances where some other tool beats "zfs send" in some way. The only point which I wish to emphasize is that it's not fair to say unilaterally that one technique is always better than another technique. Each one has their own pros/cons. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Validating alignment of NTFS/VMDK/ZFS blocks
On Thu, Mar 18, 2010 at 14:44, Chris Murray wrote: > Good evening, > I understand that NTFS & VMDK do not relate to Solaris or ZFS, but I was > wondering if anyone has any experience of checking the alignment of data > blocks through that stack? It seems to me there's a simple way to check. Pick 4k of random data (say, dd if=/dev/urandom of=newfile bs=4k count=1) and copy that onto the VM filesystem. Now write a little program to read the .vmdk file and find that 4k of data. Report the offset, and check offset % 4096 == 0. This won't help you fix things, but it'll at least tell you that something is wrong. Will ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Performance on SATA Deive
Erik Trimble wrote: James C. McPherson wrote: On 18/03/10 10:05 PM, Kashif Mumtaz wrote: Hi, Thanks for your reply BOTH are Sun Sparc T1000 machines. Hard disk 1 TB sata on both ZFS system Memory32 GB , Processor 1GH 6 core os Solaris 10 10/09 s10s_u8wos_08a SPARC PatchCluster level 142900-02(Dec 09 ) UFS machine Hard disk 1 TB sata Memory 16 GB Processor Processor 1GH 6 core Solaris 10 8/07 s10s_u4wos_12b SPARC Since you are seeing this on a Solaris 10 update release, you should log a call with your support provider to get this investigated. James C. McPherson -- Senior Software Engineer, Solaris Sun Microsystems http://www.jmcp.homeunix.com/blog I would generally agree with James, with the caveaut that you could try to update to something a bit latter than Update 4. That's pretty early-on in the ZFS deployment in Solaris 10. At the minimum, grab the latest Recommended Patch set and apply that, then see what your issues are. Oh, nevermind. I'm an idiot. I was looking at the UFS machine. -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA Timezone: US/Pacific (GMT-0800) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Performance on SATA Deive
James C. McPherson wrote: On 18/03/10 10:05 PM, Kashif Mumtaz wrote: Hi, Thanks for your reply BOTH are Sun Sparc T1000 machines. Hard disk 1 TB sata on both ZFS system Memory32 GB , Processor 1GH 6 core os Solaris 10 10/09 s10s_u8wos_08a SPARC PatchCluster level 142900-02(Dec 09 ) UFS machine Hard disk 1 TB sata Memory 16 GB Processor Processor 1GH 6 core Solaris 10 8/07 s10s_u4wos_12b SPARC Since you are seeing this on a Solaris 10 update release, you should log a call with your support provider to get this investigated. James C. McPherson -- Senior Software Engineer, Solaris Sun Microsystems http://www.jmcp.homeunix.com/blog I would generally agree with James, with the caveaut that you could try to update to something a bit latter than Update 4. That's pretty early-on in the ZFS deployment in Solaris 10. At the minimum, grab the latest Recommended Patch set and apply that, then see what your issues are. -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA Timezone: US/Pacific (GMT-0800) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies
On Mar 18, 2010, at 15:00, Miles Nordin wrote: Admittedly the second bullet is hard to manage while still backing up zvol's, pNFS / Lustre data-node datasets, windows ACL's, properties, Some commercial backup products are able to parse VMware's VMDK files to get file system information of them. The product sits on the VMware host, slurps in the files (which can be snapshotted for quiesced backups), and if you want to restore, you can either put back the entire VMDK or simply restore just the particular file(s) that are of interest. Currently NetBackup only supports "parsing" NTFS for individual file restoration. Theoretically, zvols could be added to the list of parsable container formats. Though there would probably have to be some kind of API akin to VMware's VCB or vStorage. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Performance on SATA Deive
On 18/03/10 10:05 PM, Kashif Mumtaz wrote: Hi, Thanks for your reply BOTH are Sun Sparc T1000 machines. Hard disk 1 TB sata on both ZFS system Memory32 GB , Processor 1GH 6 core os Solaris 10 10/09 s10s_u8wos_08a SPARC PatchCluster level 142900-02(Dec 09 ) UFS machine Hard disk 1 TB sata Memory 16 GB Processor Processor 1GH 6 core Solaris 10 8/07 s10s_u4wos_12b SPARC Since you are seeing this on a Solaris 10 update release, you should log a call with your support provider to get this investigated. James C. McPherson -- Senior Software Engineer, Solaris Sun Microsystems http://www.jmcp.homeunix.com/blog ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS/OSOL/Firewire...
On Mar 18, 2010, at 14:23, Bob Friesenhahn wrote: On Thu, 18 Mar 2010, erik.ableson wrote: Ditto on the Linux front. I was hoping that Solaris would be the exception, but no luck. I wonder if Apple wouldn't mind lending one of the driver engineers to OpenSolaris for a few months... Perhaps the issue is the filesystem rather than the drivers. Apple users have different expectations regarding data loss than Solaris and Linux users do. Apple users (of which I am one) expect things to Just Work. :) And there are Apple users and Apple users: http://daringfireball.net/2010/03/ode_to_diskwarrior_superduper_dropbox If anyone Apple is paying attention, perhaps you could re-open discussions with now-Oracle about getting ZFS into Mac OS. :) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] lazy zfs destroy
On Wed, Mar 17, 2010 at 9:19 PM, Chris Paul wrote: > OK I have a very large zfs snapshot I want to destroy. When I do this, the > system nearly freezes during the zfs destroy. This is a Sun Fire X4600 with > 128GB of memory. Now this may be more of a function of the IO device, but > let's say I don't care that this zfs destroy finishes quickly. I actually > don't care, as long as it finishes before I run out of disk space. > Destroys are very slow with dedup enabled, and worse with larger data sets when the dedupe table doesn't fit into RAM. Adding a l2arc may help if that's the case. -B -- Brandon High : bh...@freaks.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Validating alignment of NTFS/VMDK/ZFS blocks
I have only heard of alignment being discussed in reference to block-based storage (like DASD/iSCSI/FC). I'm not really sure how it would work out over NFS. I do see why you are asking though. My understanding is that VMDK files are basically 'aligned' but the partitions inside of them may not be. You don't state what OS you are using in your guests. Windows XP/2003 and older create mis-alligned partitions by default (within a VMDK). You would need to manually create/adjust NTFS partitions in those cases in order for them to properly fall on a 4k boundary. This could be a cause of the problem you are describing. This doc from VMware is aimed at block-based storage but it has some concepts that might be helpful as well as info on aligning guest OS partitions: http://www.vmware.com/pdf/esx3_partition_align.pdf -Brian Chris Murray wrote: Good evening, I understand that NTFS & VMDK do not relate to Solaris or ZFS, but I was wondering if anyone has any experience of checking the alignment of data blocks through that stack? I have a VMware ESX 4.0 host using storage presented over NFS from ZFS filesystems (recordsize 4KB). Within virtual machine VMDK files, I have formatted NTFS filesystems, block size 4KB. Dedup is turned on. When I run ZDB -DD, i see a figure of unique blocks which is higher than I expect, which makes me wonder whether any given 4KB in the NTFS filesystem is perfectly aligned with a 4KB block in ZFS? e.g. consider two virtual machines sharing lots of the same blocks. Assuming there /is/ a misalignment between NTFS & VMDK/VMDK & ZFS, if they're not in the same order within NTFS, they don't align, and will actually produce different blocks in ZFS: VM1 NTFS1---2---3--- ZFS 1---2---3---4--- ZFS blocks are " AA", "AABB" and so on ... Then in another virtual machine, the blocks are in a different order: VM2 NTFS1---2---3--- ZFS 1---2---3---4--- ZFS blocks for this VM would be " CC", "CCAA", "AABB" etc. So, no overlap between virtual machines, and no benefit from dedup. I may have it wrong, and there are indeed 30,785,627 unique blocks in my setup, but if there's a mechanism for checking alignment, I'd find that very helpful. Thanks, Chris ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedupratio riddle
As noted, the ratio caclulation applies over the data attempted to dedup, not the whole pool. However, I saw a commit go by just in the last couple of days about the dedupratio calculation being misleading, though I didn't check the details. Presumably this will be reported differently from the next builds. -- Dan. pgpH78u3PQOkc.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to manage scrub priority or defer scrub?
On Thu, Mar 18, 2010 at 05:21:17AM -0700, Tonmaus wrote: > > No, because the parity itself is not verified. > > Aha. Well, my understanding was that a scrub basically means reading > all data, and compare with the parities, which means that these have > to be re-computed. Is that correct? A scrub does, yes. It reads all data and metadata and checksums and verifies they're correct. A read of the pool might not - for example, it might: - read only one side of a mirror - read only one instance of a ditto block (metadata or copies>1) - use cached copies of data or metadata; for a long-running system it might be a long time since some metadata blocks were ever read, if they're frequently used. Roughly speaking, reading through the filesystem does the least work possible to return the data. A scrub does the most work possible to check the disks (and returns none of the data). For the OP: scrub issues low-priority IO (and the details of how much and how low have changed a few times along the version trail). However, that prioritisation applies only within the kernel; sata disks don't understand the prioritisation, so once the requests are with the disk they can still saturate out other IOs that made it to the front of the kernel's queue faster. If you're looking for something to tune, you may want to look at limiting the number of concurrent IO's handed to the disk to try and avoid saturating the heads. You also want to confirm that your disks are on an NCQ-capable controller (eg sata rather than cmdk) otherwise they will be severely limited to processing one request at a time, at least for reads if you have write-cache on (they will be saturated at the stop-and-wait channel, long before the heads). -- Dan. pgpoGKGntteaH.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies
On 03/18/10 12:07 PM, Khyron wrote: Ian, When you say you spool to tape for off-site archival, what software do you use? NetVault. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Heap corruption, possibly hotswap related (snv_134 with imr_sas, nvdisk drivers)
2010/3/18 Kaya Bekiroğlu : > I first noticed this panic when conducting hot-swap tests. However, > now I see it every hour or so, even when all drives are attached and > no ZFS resilvering is in progress. It appears that these panics recur on my system when the zfs-auto-snapshot service runs. Disabling the hourly zfs-auto-snapshot service prevents the panic. The panic appears to be load-related, which explains why it can also occur around hot swap, but perhaps drivers are not to blame. > Repro: > - Pull a drive > - Wait for drive absence to be acknowledged by fm > - Physically re-add the drive > > This machine contains two LSI 9240-8i SAS controllers running imr_sas > (the driver from LSI's website) and a umem NVRAM card running the > nvdisk driver. It also contains an SSD L2ARC. > > Mar 17 16:00:10 storage genunix: [ID 478202 kern.notice] kernel memory > allocator: > Mar 17 16:00:10 storage genunix: [ID 432124 kern.notice] buffer freed > to wrong cache > Mar 17 16:00:10 storage genunix: [ID 815666 kern.notice] buffer was > allocated from kmem_alloc_160, > Mar 17 16:00:10 storage genunix: [ID 530907 kern.notice] caller > attempting free to kmem_alloc_48. > Mar 17 16:00:10 storage genunix: [ID 563406 kern.notice] > buffer=ff0715c74510 bufctl=0 cache: kmem_alloc_48 > Mar 17 16:00:10 storage unix: [ID 836849 kern.notice] > Mar 17 16:00:10 storage ^Mpanic[cpu7]/thread=ff002de17c60: > Mar 17 16:00:10 storage genunix: [ID 812275 kern.notice] kernel heap > corruption detected > Mar 17 16:00:10 storage unix: [ID 10 kern.notice] > Mar 17 16:00:10 storage genunix: [ID 655072 kern.notice] > ff002de17a70 genunix:kmem_error+501 () > Mar 17 16:00:10 storage genunix: [ID 655072 kern.notice] > ff002de17ac0 genunix:kmem_slab_free+2d5 () > Mar 17 16:00:10 storage genunix: [ID 655072 kern.notice] > ff002de17b20 genunix:kmem_magazine_destroy+fe () > Mar 17 16:00:10 storage genunix: [ID 655072 kern.notice] > ff002de17b70 genunix:kmem_cache_magazine_purge+a0 () > Mar 17 16:00:10 storage genunix: [ID 655072 kern.notice] > ff002de17ba0 genunix:kmem_cache_magazine_resize+32 () > Mar 17 16:00:10 storage genunix: [ID 655072 kern.notice] > ff002de17c40 genunix:taskq_thread+248 () > Mar 17 16:00:10 storage genunix: [ID 655072 kern.notice] > ff002de17c50 unix:thread_start+8 () > Mar 17 16:00:10 storage unix: [ID 10 kern.notice] > Mar 17 16:00:10 storage genunix: [ID 672855 kern.notice] syncing file > systems... > Mar 17 16:00:10 storage genunix: [ID 904073 kern.notice] done > Mar 17 16:00:11 storage genunix: [ID 111219 kern.notice] dumping to > /dev/zvol/dsk/rpool/dump, offset 65536, content: kernel > Mar 17 16:00:11 storage ahci: [ID 405573 kern.info] NOTICE: ahci0: > ahci_tran_reset_dport port 0 reset port > > I'd file this directly to the bug database but I'm waiting for my > account to be reactivated. > > zpool status: > pool: tank > state: ONLINE > scrub: resilver completed after 0h0m with 0 errors on Thu Mar 18 10:07:12 > 2010 > config: > > NAME STATE READ WRITE CKSUM > tank ONLINE 0 0 0 > raidz1-0 ONLINE 0 0 0 > c6t15d1 ONLINE 0 0 0 > c6t14d1 ONLINE 0 0 0 > c6t13d1 ONLINE 0 0 0 > raidz1-1 ONLINE 0 0 0 > c6t12d1 ONLINE 0 0 0 > c6t11d1 ONLINE 0 0 0 > c6t10d1 ONLINE 0 0 0 > raidz1-2 ONLINE 0 0 0 > c6t9d1 ONLINE 0 0 0 > c6t8d1 ONLINE 0 0 0 > c5t9d1 ONLINE 0 0 0 > logs > c7d1p0 ONLINE 0 0 0 > cache > c4t0d0p2 ONLINE 0 0 0 > spares > c5t8d1 AVAIL > > -- > Kaya > -- Kaya ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Performance on SATA Deive
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 18.03.2010 21:31, Daniel Carosone wrote: > You have a gremlin to hunt... Wouldn't Sun help here? ;) (sorry couldn't help myself, I've spent a week hunting gremlins until I hit the brick wall of the MPT problem) //Svein - -- - +---+--- /"\ |Svein Skogen | sv...@d80.iso100.no \ / |Solberg Østli 9| PGP Key: 0xE5E76831 X|2020 Skedsmokorset | sv...@jernhuset.no / \ |Norway | PGP Key: 0xCE96CE13 | | sv...@stillbilde.net ascii | | PGP Key: 0x58CD33B6 ribbon |System Admin | svein-listm...@stillbilde.net Campaign|stillbilde.net | PGP Key: 0x22D494A4 +---+--- |msn messenger: | Mobile Phone: +47 907 03 575 |sv...@jernhuset.no | RIPE handle:SS16503-RIPE - +---+--- If you really are in a hurry, mail me at svein-mob...@stillbilde.net This mailbox goes directly to my cellphone and is checked even when I'm not in front of my computer. - Picture Gallery: https://gallery.stillbilde.net/v/svein/ - -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.12 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkuikAIACgkQSBMQn1jNM7ZHpQCgn15+EsQzafhJw1HnhBWlTW9X STUAoPvVS4bfq3E3N3Vg7JCuQ3M5+Am6 =YSRa -END PGP SIGNATURE- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Validating alignment of NTFS/VMDK/ZFS blocks
On Thu, Mar 18, 2010 at 2:44 PM, Chris Murray wrote: > Good evening, > I understand that NTFS & VMDK do not relate to Solaris or ZFS, but I was > wondering if anyone has any experience of checking the alignment of data > blocks through that stack? > NetApp has a great little tool called mbrscan/mbralignit's free, but I'm not sure if NetApp customers are supposed to distribute it. -marc ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Performance on SATA Deive
On Thu, Mar 18, 2010 at 03:36:22AM -0700, Kashif Mumtaz wrote: > I did another test on both machine. And write performance on ZFS > extraordinary slow. > - > In ZFS data was being write around 1037 kw/s while disk remain busy 100%. That is, as you say, such an extraordinarily slow number that we have to start at the very basics and eliminate fundamental problems. I have seen disks go bad in a way that they simply become very very slow. You need to be sure that this isn't your problem. Or perhaps there's some hardware issue when the disks are used in parallel? Check all the cables and connectors. Check logs for any errors. Do you have the opportunity to try testing write speed with dd to the raw disks? If the pool is mirrored, can you detach one side at a time? Test the detached disk with dd, and the pool with the other disk, one at a time and then concurrently. One slow disk will slow down the mirror (but I don't recall seeing such an imbalance in your iostat output either). Do you have some spare disks to try other tests with? Try a ZFS install on those, and see they also have the problem. Try a UFS install on the current disks, and see if they still have the problem. Can you swap the disks between the T1000s and see if the problem stays with the disks or the chassis? You have a gremlin to hunt... -- Dan. pgprooWSK6vzu.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Validating alignment of NTFS/VMDK/ZFS blocks
Not having specific knowledge of the VMDK format, I think what you are seeing is that there is extra data associated with maintaining the VMDK. If you are seeing lower dedup ratios than you would expect, it sounds like some of this extra data could be added to each block. The VMDK spec appears to be open, perhaps a read through the spec might help understand what VMware is doing to the NTFS data - http://en.wikipedia.org/wiki/VMDK --joe On 3/18/2010 11:44 AM, Chris Murray wrote: Good evening, I understand that NTFS& VMDK do not relate to Solaris or ZFS, but I was wondering if anyone has any experience of checking the alignment of data blocks through that stack? I have a VMware ESX 4.0 host using storage presented over NFS from ZFS filesystems (recordsize 4KB). Within virtual machine VMDK files, I have formatted NTFS filesystems, block size 4KB. Dedup is turned on. When I run ZDB -DD, i see a figure of unique blocks which is higher than I expect, which makes me wonder whether any given 4KB in the NTFS filesystem is perfectly aligned with a 4KB block in ZFS? e.g. consider two virtual machines sharing lots of the same blocks. Assuming there /is/ a misalignment between NTFS& VMDK/VMDK& ZFS, if they're not in the same order within NTFS, they don't align, and will actually produce different blocks in ZFS: VM1 NTFS1---2---3--- ZFS 1---2---3---4--- ZFS blocks are " AA", "AABB" and so on ... Then in another virtual machine, the blocks are in a different order: VM2 NTFS1---2---3--- ZFS 1---2---3---4--- ZFS blocks for this VM would be " CC", "CCAA", "AABB" etc. So, no overlap between virtual machines, and no benefit from dedup. I may have it wrong, and there are indeed 30,785,627 unique blocks in my setup, but if there's a mechanism for checking alignment, I'd find that very helpful. Thanks, Chris ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies
> "c" == Miles Nordin writes: > "mg" == Mike Gerdts writes: c> are compatible with the goals of an archival tool: sorry, obviously I meant ``not compatible''. mg> Richard Elling made an interesting observation that suggests mg> that storing a zfs send data stream on tape is a quite mg> reasonable thing to do. Richard's background makes me trust mg> his analysis of this much more than I trust the typical person mg> that says that zfs send output is poison. ssh and tape are perfect, yet whenever ZFS pools become corrupt Richard talks about scars on his knees from weak TCP checksums and lying disk drives and about creating a ``single protection domain'' of zfs checksums and redundancy instead of a bucket-brigade of fail of tcp into ssh into $blackbox_backup_Solution(likely involving unchecksummed disk storage) into SCSI/FC into ECC tapes. At worst, lying then or lying now? At best, the whole thing still strikes me as a pattern of banging a bunch of arcania into whatever shape's needed to fit the conclusion that ZFS is glorious and no further work is requried to make it perfect. and there is still no way to validate a tape without extracting it, which is last I worked with them, an optional but suggested part of $blackbox_backup_Solution (and one which, incidentally, helps with the bucket brigade problem Richard likes to point out). and the other archival problems of constraining the restore environment, and the fundamental incompatibility of goals between faithful replication and robust, future-proof archiving from my last post. pgpLLsyZQuSKJ.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Validating alignment of NTFS/VMDK/ZFS blocks
Please excuse my pitiful example. :-) I meant to say "*less* overlap between virtual machines", as clearly block "AABB" occurs in both. -Original Message- From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Chris Murray Sent: 18 March 2010 18:45 To: zfs-discuss@opensolaris.org Subject: [zfs-discuss] Validating alignment of NTFS/VMDK/ZFS blocks Good evening, I understand that NTFS & VMDK do not relate to Solaris or ZFS, but I was wondering if anyone has any experience of checking the alignment of data blocks through that stack? I have a VMware ESX 4.0 host using storage presented over NFS from ZFS filesystems (recordsize 4KB). Within virtual machine VMDK files, I have formatted NTFS filesystems, block size 4KB. Dedup is turned on. When I run ZDB -DD, i see a figure of unique blocks which is higher than I expect, which makes me wonder whether any given 4KB in the NTFS filesystem is perfectly aligned with a 4KB block in ZFS? e.g. consider two virtual machines sharing lots of the same blocks. Assuming there /is/ a misalignment between NTFS & VMDK/VMDK & ZFS, if they're not in the same order within NTFS, they don't align, and will actually produce different blocks in ZFS: VM1 NTFS1---2---3--- ZFS 1---2---3---4--- ZFS blocks are " AA", "AABB" and so on ... Then in another virtual machine, the blocks are in a different order: VM2 NTFS1---2---3--- ZFS 1---2---3---4--- ZFS blocks for this VM would be " CC", "CCAA", "AABB" etc. So, no overlap between virtual machines, and no benefit from dedup. I may have it wrong, and there are indeed 30,785,627 unique blocks in my setup, but if there's a mechanism for checking alignment, I'd find that very helpful. Thanks, Chris -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS/OSOL/Firewire...
>Apple users have different expectations regarding data loss than Solaris and >Linux users do. Come on, no Apple user bashing. Not true, not fair. Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies
> "djm" == Darren J Moffat writes: djm> I've logged CR# "6936195 ZFS send stream while checksumed djm> isn't fault tollerant" to keep track of that. Other tar/cpio-like tools are also able to: * verify the checksums without extracting (like scrub) * verify or even extract the stream using a small userland tool that writes files using POSIX functions, so that you can build the tool on not-Solaris or extract the data onto not-ZFS. The 'zfs send' stream can't be extracted without the solaris kernel, although yes the promise that newer kernels can extract older streams is a very helpful one. For example, ufsdump | ufsrestore could move UFS data into ZFS. but zfs send | zfs recv leaves us trapped on ZFS, even though migrating/restoring ZFS data onto a pNFS or Lustre backend is a realistic desire in the near term. * partial extract Personally, I could give up the third bullet point. Admittedly the second bullet is hard to manage while still backing up zvol's, pNFS / Lustre data-node datasets, windows ACL's, properties, snapshots/clones, u.s.w., so it's kind of...if you want both vanilla and chocolate cake at once, you're both going to be unhappy. But there should at least be *a* tool that can copy from zfs to NFSv4 while preserving windows ACL's, and the tool should build on other OS's that support NFSv4 and be capable of faithfully copying one NFSv4 tree to another preserving all the magical metadata. I know it sounds like ACL-aware rsync is unrelated to your (Darren) goal of tweaking 'zfs send' to be appropriate for backups, but for example before ZFS I could make a backup on the machine with disks attached to it or on an NFS client, and get exactly the same stream out. Likewise, I could restore into an NFS client. Sticking to a clean API instead of dumping the guts of the filesystem, made the old stream formats more archival. The ``I need to extract a ZFS dataset so large that my only available container is a distributed Lustre filesystem'' use-case is pretty squarely within the archival realm, is going to be urgent in a year or so if it isn't already, and is accomodated by GNUtar, cpio, Amanda (even old ufsrestore Amanda), and all the big commercial backup tools. I admit it would be pretty damn cool if someone could write a purely userland version of 'zfs send' and 'zfs recv' that interact with the outside world using only POSIX file i/o and unix pipes but produce the standard deduped-ZFS-stream format, even if the hypothetical userland tool accomplishes this by including a FUSE-like amount of ZFS code and thus being quite hard to build. However, so far I don't think the goals of a replication tool: ``make a faithful and complete copy, efficiently, or else give an error,'' are compatible with the goals of an archival tool: ``extract robustly far into the future even in non-ideal and hard to predict circumstances such as different host kernel, different destination filesystem, corrupted stream, limited restore space.'' pgpyWHuwbuWZf.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS/OSOL/Firewire...
Bob Friesenhahn wrote: On Thu, 18 Mar 2010, erik.ableson wrote: Ditto on the Linux front. I was hoping that Solaris would be the exception, but no luck. I wonder if Apple wouldn't mind lending one of the driver engineers to OpenSolaris for a few months... Perhaps the issue is the filesystem rather than the drivers. Apple users have different expectations regarding data loss than Solaris and Linux users do. No, the Solaris firewire drivers are just broken. There is a long trail of bug reports that nobody has sufficient interest to fix. And really, you think Linux is better about data loss than OS X? Please cite your sources, because given my experience with Linux, I call bullshit. -- Carson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Validating alignment of NTFS/VMDK/ZFS blocks
Good evening, I understand that NTFS & VMDK do not relate to Solaris or ZFS, but I was wondering if anyone has any experience of checking the alignment of data blocks through that stack? I have a VMware ESX 4.0 host using storage presented over NFS from ZFS filesystems (recordsize 4KB). Within virtual machine VMDK files, I have formatted NTFS filesystems, block size 4KB. Dedup is turned on. When I run ZDB -DD, i see a figure of unique blocks which is higher than I expect, which makes me wonder whether any given 4KB in the NTFS filesystem is perfectly aligned with a 4KB block in ZFS? e.g. consider two virtual machines sharing lots of the same blocks. Assuming there /is/ a misalignment between NTFS & VMDK/VMDK & ZFS, if they're not in the same order within NTFS, they don't align, and will actually produce different blocks in ZFS: VM1 NTFS1---2---3--- ZFS 1---2---3---4--- ZFS blocks are " AA", "AABB" and so on ... Then in another virtual machine, the blocks are in a different order: VM2 NTFS1---2---3--- ZFS 1---2---3---4--- ZFS blocks for this VM would be " CC", "CCAA", "AABB" etc. So, no overlap between virtual machines, and no benefit from dedup. I may have it wrong, and there are indeed 30,785,627 unique blocks in my setup, but if there's a mechanism for checking alignment, I'd find that very helpful. Thanks, Chris -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 18.03.2010 17:49, erik.ableson wrote: > Conceptually, think of a ZFS system as a SAN Box with built-in asynchronous > replication (free!) with block-level granularity. Then look at your other > backup requirements and attach whatever is required to the top of the stack. > Remembering that everyone's requirements can be wildly or subtly different so > doing it differently is just adapting to the environment. e.g. - I use ZFS > systems at home and work and the tools and scale are wildly different and > therefor so are the backup strategies – but that's mostly a budget issue at > home... :-) I'll answer this with some perspective from my own usage, so bear with me if my setup isn't exactly enterprise-grade. Which is exactly what I'm considering it. Basically a semi-intelligent box that has a lot of disks, and an attached tape autoloader for dumping "then entire box died!" data to. In my case the zvols contains vmfs (one vmfs per vm, actually), so to the just-died you can add "and brought the rest of the network down with it". A typical one-man setup with as cheap a budget as possible. And as I've posted some weeks ago, I've ... had to test the restore bit. Having a "hit f8 during boot, and boot directly off the tape" I can live without (using a bootcd instead). But missing the "I've now started restore, it'll manage itself and switch tapes when necessary" isn't something I really want to miss out on. For the record, restore in my case takes appx 12 hours fully automated and managing a good 60MB/Sec all the way. Now, I'm willing to miss out on some of the fun, and add an extra server for simply handling the iSCSI bit, and then set up that to dump itself onto the Windows-server (who has the backup). That's a NOK12K investment (I've checked) with an option for becoming NOK16K if I need more than 2 1000BaseT's bundled (if two isn't enough, adding another four sounds about right). The main storage box has an LSI 8308elp with 8 1T5 7200rpm barracudas in RAID50 with coldspares, and it's ... near enough my bed to hear the alarm should a disk die on me. So strictly speaking I don't need ZFS featurewise (the 8308 has patrol reads/scrubbing, and it pushes sufficient IO speeds to outperform the four NICS in the box). So probably for my needs QFS/SAM would've been a better solution. I really don't know, since the available installer for QFS/SAM only plays nice with https://gallery.stillbilde.net/v/svein/ - -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.12 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkuicZ8ACgkQSBMQn1jNM7aToQCg6hxHJaY+xLHZg94bEkWYzkgk XtoAn2nsi8+wbbHq6kAfvFGwEMESqR6c =OpIM -END PGP SIGNATURE- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS/OSOL/Firewire...
On Thu, 18 Mar 2010, erik.ableson wrote: Ditto on the Linux front. I was hoping that Solaris would be the exception, but no luck. I wonder if Apple wouldn't mind lending one of the driver engineers to OpenSolaris for a few months... Perhaps the issue is the filesystem rather than the drivers. Apple users have different expectations regarding data loss than Solaris and Linux users do. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies
On 18/03/2010 17:26, Svein Skogen wrote: The utility: Can't handle streams being split (in case of streams being larger that a single backup media). I think it should be possible to store the 'zfs send' stream via NDMP and let NDMP deal with the tape splitting. Though that may need additional software that isn't free (or cheap) to drive the parts of NDMP that are in Solaris. I don't know enough about NDMP to be sure but I think that should be possible. And here I was thinking that the NDMP stack basically was tapedev and/or autoloader device via network? (i.e. not a backup utility at all but a method for the software managing the backup to attach the devices) NDMP doesn't define the format of what goes on the tape so it can help put the 'zfs send' stream on the tape and thus deal with the lack of 'zfs send' being able to handle tape media smaller than its stream size. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies
As to your two questions above, I'll try to answer them from my limited understanding of the issue. The format: Isn't fault tolerant. In the least. One single bit wrong and the entire stream is invalid. A FEC wrapper would fix this. I've logged CR# "6936195 ZFS send stream while checksumed isn't fault tollerant" to keep track of that. The utility: Can't handle streams being split (in case of streams being larger that a single backup media). I think it should be possible to store the 'zfs send' stream via NDMP and let NDMP deal with the tape splitting. Though that may need additional software that isn't free (or cheap) to drive the parts of NDMP that are in Solaris. I don't know enough about NDMP to be sure but I think that should be possible. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies
On 18 mars 2010, at 15:51, Damon Atkins wrote: > A system with 100TB of data its 80% full and the a user ask their local > system admin to restore a directory with large files, as it was 30days ago > with all Windows/CIFS ACLS and NFSv4/ACLS etc. > > If we used zfs send, we need to go back to a zfs send some 30days ago, and > find 80TB of disk space to be able to restore it. > > zfs send/recv is great for copy zfs from one zfs file system to another file > system even across servers. Bingo ! The zfs send/recv scenario is for backup to another site or server. Backup in this context being a second copy stored independently from the original/master. In one scenario here, we have individual sites that have zvol backed iSCSI volumes based on small, high performance 15K disks in mirror vdevs for the best performance. I only keep about a week of daily snapshots locally. I use ZFS send/recv to a backup system where I have lots of cheap, slow SATA drives in RAIDZ6 where I can afford to accumulate a lot more historical snapshots. The interest is that you can use the same tools in an asymmetric manner, with high performance primary systems and one or a few big slow systems to store your backups. Now for instances where I need to go back and get a file back off an NFS published filesystem, I can just go browse the .zfs/snapshot directory as required - or search for it or whatever I want. It's a live filesystem, not an inert object, dependent on external indices and hardware. I think that this is the fundamental disconnect in these discussions where people's ideas (or requirements) of what constitutes "a backup" are conflicting. There are two major reasons and types of backups : one is to be able to minimize your downtime and get systems running again as quickly as possible. (the server's dead - make it come back!). The other is the ability to go back in time and rescue data that has become lost, corrupted or otherwise unavailable often with very granular requirements. (I need this particular 12K file from August 12, 2009) For my purposes, most of my backup strategies are oriented towards Business Uptime and minimal RTO. Given the data volume I work with using lots of virtual machines, tape is strictly an archival tool. I just can't restore fast enough, and it introduces way to many mechanical dependencies into the process (well I could if I had an unlimited budget). I can restart entire sites from a backup system by cloning a filesystem off a backup snapshot and presenting the volumes to the servers that need it. Granted, I won't have the performance of a primary site, but it will work and people can get work done. This responds to the first requirement of minimal downtime. Going back in time is accomplished via lots of snapshots on the backup storage system. Which I can afford since I'm not using expensive disks here. Then you move up the stack into the contents of the volumes and here's where you use your traditional backup tools to get data off the top of the stack - out of the OS that's handling the contents of the volume that understands it's particularities regarding ACLS and private volume formats like VMFS. zfs send/recv is for cloning data off the bottom of the stack without requiring the least bit of knowledge about what's happening on top. It's just like using any of the asynchronous replication tools that are used in SANs. And they make no bones about the fact that they are strictly a block-level thing and don't even ask them about the contents. At best, they will try to coordinate filesystem snapshots and quiescing operations with the block level snapshots. Other backup tools take your data off the top of the stack in the context where it is used with a fuller understanding of the issues of stuff like ACLs. When dealing with zvols, ZFS should have no responsibility in trying to understand what you do in there other than supplying the blocks. VMFS, NTFS, btrfs, ext4, HFS+, XFS, JFS, ReiserFS and that's just the tip of the iceberg... ZFS has muddied the waters by straddling the SAN and NAS worlds. > But their needs to be a tool: > * To restore an individual file or a zvol (with all ACLs/properties) > * That allows backup vendors (which place backups on tape or disk or CD or > ..) build indexes of what is contain in the backup (e.g. filename, owner, > size modification dates, type (dir/file/etc) ) > *Stream output suitable for devices like tape drives. > *Should be able to tell if the file is corrupted when being restored. > *May support recovery of corrupt data blocks within the stream. > *Preferable gnutar command-line compatible > *That admins can use to backup and transfer a subset of files e.g user home > directory (which is not a file system) to another server or on to CD to be > sent to their new office location, or Highly incomplete and in no particular order : Backup Exec NetBackup Bacula Amanda/Zmanda Retrospect Avamar Ar
Re: [zfs-discuss] ZFS/OSOL/Firewire...
On 18 mars 2010, at 16:58, David Dyer-Bennet wrote: > On Thu, March 18, 2010 04:50, erik.ableson wrote: > >> It would appear that the bus bandwidth is limited to about 10MB/sec >> (~80Mbps) which is well below the theoretical 400Mbps that 1394 is >> supposed to be able to handle. I know that these two disks can go >> significantly higher since I was seeing 30MB/sec when they were used on >> Macs previously in the same daisy-chain configuration. >> >> I get the same symptoms on both the 2009.06 and the b129 machines. > > While it wasn't on Solaris, I must say that I've been consistently > disappointed by the performance of external 1394 drives on various Linux > boxes. I invested in the interface cards for the boxes, and in the > external drives that supported Firewire, because everything said it > performed much better for disk IO, but in fact I have never found it to > be the case. > > Sort-of-glad to hear I don't have to wonder if I should be trying it on > Solaris. Ditto on the Linux front. I was hoping that Solaris would be the exception, but no luck. I wonder if Apple wouldn't mind lending one of the driver engineers to OpenSolaris for a few months... Hmmm - that makes me wonder about the Darwin drivers - they're open sourced if I remember correctly. Erik ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Is this a sensible spec for an iSCSI storage box?
>I was planning to mirror them - mainly in the hope that I could hot swap a new >one in the event that an existing one started to degrade. I suppose I could >start with one of each and convert to a mirror later although the prospect of >losing either disk fills me with dread. You do not need to mirror the L2ARC devices, as the system will just hit disk as necessary. Mirroring sounds like a good idea on the SLOG, but this has been much discussed on the forums. >> Why not larger capacity disks? >We will run out of iops before we run out of space. Interesting. I find IOPS is more proportional to the number of VMs vs disk space. User: I need a VM that will consume up to 80G in two years, so give me an 80G disk. Me: OK, but recall we can expand disks and filesystems on the fly, without downtime. User: Well, that is cool, but 80G to start with please. Me: I also believe the SLOG and L2ARC will make using high RPM disks not as necessary. But, from what I have read, higher RPM disks will greatly help with scrubs and reslivers. Maybe two pools - one with fast mirrored SAS, another with big SATA. Or all SATA, but one pool with mirrors, another with raidz2. Many options. But measure to see what works for you. iometer is great for that, I find. >Any opinions on the use of battery backed SAS adapters? Surely these will help with performance in write back mode, but I have not done any hard measurements. Anecdotally my PERC5i in a Dell 2950 seemed to greatly help with IOPS on a five disk raidz. There are pros and cons. Search the forums, but off the top of my head 1) SLOGs are much larger than controller caches: 2) only synced write activity is cached in a ZIL, whereas a controller cache will cache everything, needed or not, thus running out of space sooner; 3) SLOGS and L2ARC devices are specialized caches for read and write loads, vs. the all in one cache of a controller. 4) A controller *may* be faster, since it uses ram for the cache. One of the benefits of a SLOG on the SAS/SATA bus is for a cluster. If one node goes down, the other can bring up the pool, check the ZIL for any necessary transactions, and apply them. To do this with battery backed cache, you would need fancy interconnects between the nodes, cache mirroring, etc. All of those things that SAN array products do. Sounds like you have a fun project. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 18.03.2010 18:37, Darren J Moffat wrote: > On 18/03/2010 17:34, Svein Skogen wrote: >> How would NDMP help with this any more than running a local pipe >> splitting the stream (and handling the robotics for feeding in the next >> tape)? > > Probably doesn't in that case. > >> I can see the point of NDMP when the tape library isn't physically >> connected to the same box as the zpools, but feeding local data via a >> network servce seems to me to be just complicating things... > > Indeed if the drive is local then you it may be adding a layer you don't > need. I'd think "getting it to work locally" make the utility in a fashion that it can take a local library, OR a remote NDMP one, would be a priority. Maybe more so for database customers of midsized size, that doesn't have the luxury of 64 datacenters on multiple continents. ;) Having the option of having "catastrophy restore" backups picked up every morning (as a stack of 8 tapes?) and brought offsite for safekeeping should probably be in the operating instructions for ... all oracle-customers not having the luxury of a multisite setup. ;) I'd suspect there are a whole lot more small customers than large ones. :p //Svein - -- - +---+--- /"\ |Svein Skogen | sv...@d80.iso100.no \ / |Solberg Østli 9| PGP Key: 0xE5E76831 X|2020 Skedsmokorset | sv...@jernhuset.no / \ |Norway | PGP Key: 0xCE96CE13 | | sv...@stillbilde.net ascii | | PGP Key: 0x58CD33B6 ribbon |System Admin | svein-listm...@stillbilde.net Campaign|stillbilde.net | PGP Key: 0x22D494A4 +---+--- |msn messenger: | Mobile Phone: +47 907 03 575 |sv...@jernhuset.no | RIPE handle:SS16503-RIPE - +---+--- If you really are in a hurry, mail me at svein-mob...@stillbilde.net This mailbox goes directly to my cellphone and is checked even when I'm not in front of my computer. - Picture Gallery: https://gallery.stillbilde.net/v/svein/ - -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.12 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkuiZngACgkQSBMQn1jNM7Y4kgCg7EeQpvZVBzBCcGwI5mdG1vrr 7MYAoNj/EiUTQzycz4bM+wSs9HWZD589 =Cthq -END PGP SIGNATURE- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies
On 18/03/2010 17:34, Svein Skogen wrote: How would NDMP help with this any more than running a local pipe splitting the stream (and handling the robotics for feeding in the next tape)? Probably doesn't in that case. I can see the point of NDMP when the tape library isn't physically connected to the same box as the zpools, but feeding local data via a network servce seems to me to be just complicating things... Indeed if the drive is local then you it may be adding a layer you don't need. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 18.03.2010 18:28, Darren J Moffat wrote: > On 18/03/2010 17:26, Svein Skogen wrote: The utility: Can't handle streams being split (in case of streams being larger that a single backup media). >>> >>> I think it should be possible to store the 'zfs send' stream via NDMP >>> and let NDMP deal with the tape splitting. Though that may need >>> additional software that isn't free (or cheap) to drive the parts of >>> NDMP that are in Solaris. I don't know enough about NDMP to be sure but >>> I think that should be possible. >>> >> >> And here I was thinking that the NDMP stack basically was tapedev and/or >> autoloader device via network? (i.e. not a backup utility at all but a >> method for the software managing the backup to attach the devices) > > NDMP doesn't define the format of what goes on the tape so it can help > put the 'zfs send' stream on the tape and thus deal with the lack of > 'zfs send' being able to handle tape media smaller than its stream size. How would NDMP help with this any more than running a local pipe splitting the stream (and handling the robotics for feeding in the next tape)? I can see the point of NDMP when the tape library isn't physically connected to the same box as the zpools, but feeding local data via a network servce seems to me to be just complicating things... But that's just my opinion. //Svein - -- - +---+--- /"\ |Svein Skogen | sv...@d80.iso100.no \ / |Solberg Østli 9| PGP Key: 0xE5E76831 X|2020 Skedsmokorset | sv...@jernhuset.no / \ |Norway | PGP Key: 0xCE96CE13 | | sv...@stillbilde.net ascii | | PGP Key: 0x58CD33B6 ribbon |System Admin | svein-listm...@stillbilde.net Campaign|stillbilde.net | PGP Key: 0x22D494A4 +---+--- |msn messenger: | Mobile Phone: +47 907 03 575 |sv...@jernhuset.no | RIPE handle:SS16503-RIPE - +---+--- If you really are in a hurry, mail me at svein-mob...@stillbilde.net This mailbox goes directly to my cellphone and is checked even when I'm not in front of my computer. - Picture Gallery: https://gallery.stillbilde.net/v/svein/ - -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.12 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkuiZBgACgkQSBMQn1jNM7Y3XACglyXvPiSd+iInxLaJVeY+lnUn GiAAn0xjL7KWfbwfwHz7gaKA8FtGNZOb =6yI8 -END PGP SIGNATURE- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 18.03.2010 18:21, Darren J Moffat wrote: >> As to your two questions above, I'll try to answer them from my limited >> understanding of the issue. >> >> The format: Isn't fault tolerant. In the least. One single bit wrong and >> the entire stream is invalid. A FEC wrapper would fix this. > > I've logged CR# "6936195 ZFS send stream while checksumed isn't fault > tollerant" to keep track of that. > >> The utility: Can't handle streams being split (in case of streams being >> larger that a single backup media). > > I think it should be possible to store the 'zfs send' stream via NDMP > and let NDMP deal with the tape splitting. Though that may need > additional software that isn't free (or cheap) to drive the parts of > NDMP that are in Solaris. I don't know enough about NDMP to be sure but > I think that should be possible. > And here I was thinking that the NDMP stack basically was tapedev and/or autoloader device via network? (i.e. not a backup utility at all but a method for the software managing the backup to attach the devices) //Svein - -- - +---+--- /"\ |Svein Skogen | sv...@d80.iso100.no \ / |Solberg Østli 9| PGP Key: 0xE5E76831 X|2020 Skedsmokorset | sv...@jernhuset.no / \ |Norway | PGP Key: 0xCE96CE13 | | sv...@stillbilde.net ascii | | PGP Key: 0x58CD33B6 ribbon |System Admin | svein-listm...@stillbilde.net Campaign|stillbilde.net | PGP Key: 0x22D494A4 +---+--- |msn messenger: | Mobile Phone: +47 907 03 575 |sv...@jernhuset.no | RIPE handle:SS16503-RIPE - +---+--- If you really are in a hurry, mail me at svein-mob...@stillbilde.net This mailbox goes directly to my cellphone and is checked even when I'm not in front of my computer. - Picture Gallery: https://gallery.stillbilde.net/v/svein/ - -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.12 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkuiYjEACgkQSBMQn1jNM7aQ0QCg6hAO3oCb0YcxBbRceTO1ubMv OhEAoOgFoY903MrazWcRq2HtHH72LXjF =8aWY -END PGP SIGNATURE- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Is this a sensible spec for an iSCSI storage box?
> It is hard, as you note, to recommend a box without > knowing the load. How many linux boxes are you > talking about? This box will act as a backing store for a cluster of 3 or 4 XenServers with upwards of 50 VMs running at any one time. > Will you mirror your SLOG, or load balance them? I > ask because perhaps one will be enough, IO wise. My > box has one SLOG (X25-E) and can support about 2600 > IOPS using an iometer profile that closely > approximates my work load. My ~100 VMs on 8 ESX boxes > average around 1000 IOPS, but can peak 2-3x that > during backups. I was planning to mirror them - mainly in the hope that I could hot swap a new one in the event that an existing one started to degrade. I suppose I could start with one of each and convert to a mirror later although the prospect of losing either disk fills me with dread. > Don't discount NFS. I absolutely love NFS for > management and thin provisioning reasons. Much easier > (to me) than managing iSCSI, and performance is > similar. I highly recommend load testing both iSCSI > and NFS before you go live. Crash consistent backups > of your VMs are possible using NFS, and recovering a > VM from a snapshot is a little easier using NFS, I > find. That's interesting feedback. Given how easy it is to create NFS and iSCSI shares in osol, I'll definitely try both and see how they compare. > Why not larger capacity disks? We will run out of iops before we run out of space. Its is more likely that we will gradually replace some of the SATA drives with 6gbps SAS drives to help with that and we've been mulling over using an LSI SAS 9211-8i controller to provide that upgrade path: http://www.lsi.com/storage_home/products_home/host_bus_adapters/sas_hbas/internal/sas9211-8i/index.html > Hopefully your switches support NIC aggregation? Yes, we're hoping that a bond of 4 x NICs will cope. Any opinions on the use of battery backed SAS adapters? - it also occurred to me after writing this that perhaps we could use one and configure it to report writes as being flushed to disk before they actually were. That might give a slight edge in performance in some cases but I would prefer to have the data security instead, tbh. Matt. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 18.03.2010 14:28, Darren J Moffat wrote: > > > On 18/03/2010 13:12, joerg.schill...@fokus.fraunhofer.de wrote: >> Darren J Moffat wrote: >> >>> So exactly what makes it unsuitable for backup ? >>> >>> Is it the file format or the way the utility works ? >>> >>> If it is the format what is wrong with it ? >>> >>> If it is the utility what is needed to fix that ? >> >> This has been discussed many times in the past already. > >> If you archive the incremental "star send" data streams, you cannot >> extract single files andit seems that this cannot be fixed without >> introducing a different archive format. > > That assumes you are writing the 'zfs send' stream to a file or file > like media. In many cases people using 'zfs send' for they backup > strategy are they are writing it back out using 'zfs recv' into another > pool. In those cases the files can even be restored over NFS/CIFS by > using the .zfs/snapshot directory For the archival of files, most utilities can be ... converted (probably by including additional metadata) to store those. The problem arises with zvols (which is where I'm considering zfs send for backup anyways). Since these volumes already are an all-or-nothing scenario restorewise, that argument against using send/receive is flawed from the getgo. ( to restore individual files from a zvol exported as an iSCSI disk, the backup software would have to go on the machine mounting the iSCSI disk, not as a backup of the zvol itself), which basically means that apart from the rollback of snapshots, the send/receive backupstream is only likely to be used in a disaster-rebuild situation, where "restores all of itself batch" is a feature, not a bug. In that scenario "restoring everything" _IS_ a feature, not a bug. As to your two questions above, I'll try to answer them from my limited understanding of the issue. The format: Isn't fault tolerant. In the least. One single bit wrong and the entire stream is invalid. A FEC wrapper would fix this. The utility: Can't handle streams being split (in case of streams being larger that a single backup media). Both of these usually gets fended off with the "was never meant as a backup solution", "you're trying to use it as ufsdump which it isn't, on purpose, ufsdump is oldfashioned" and similar arguments. Often accompanied with creative suggestions such as using usb disks (Have you ever tried getting several terabytes back and forth over USB?), and then a helpful pointer to multiple-thousand-dollars worth of backup software, as an excuse for why noone should be considering adding proper backup features to zfs itself. The last paragraph may sound like I'm taking a jab at specific people, I'm not, really. But I've had my share of helpful people who have been anything but helpful. Most of them taking care not to put their answers on the lists, and quite a lot of them wanting to sell me services or software (or rainwear such as macintoshes) //Svein - -- - +---+--- /"\ |Svein Skogen | sv...@d80.iso100.no \ / |Solberg Østli 9| PGP Key: 0xE5E76831 X|2020 Skedsmokorset | sv...@jernhuset.no / \ |Norway | PGP Key: 0xCE96CE13 | | sv...@stillbilde.net ascii | | PGP Key: 0x58CD33B6 ribbon |System Admin | svein-listm...@stillbilde.net Campaign|stillbilde.net | PGP Key: 0x22D494A4 +---+--- |msn messenger: | Mobile Phone: +47 907 03 575 |sv...@jernhuset.no | RIPE handle:SS16503-RIPE - +---+--- If you really are in a hurry, mail me at svein-mob...@stillbilde.net This mailbox goes directly to my cellphone and is checked even when I'm not in front of my computer. - Picture Gallery: https://gallery.stillbilde.net/v/svein/ - -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.12 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkuiWrwACgkQSBMQn1jNM7YvSACg9+Nh3REdxML6cnc0cWDP5cbP co4AoKjmeYx3o4/iQhkW7/tgvfF1qPvN =bNBT -END PGP SIGNATURE- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies
On 18/03/2010 13:12, joerg.schill...@fokus.fraunhofer.de wrote: Darren J Moffat wrote: So exactly what makes it unsuitable for backup ? Is it the file format or the way the utility works ? If it is the format what is wrong with it ? If it is the utility what is needed to fix that ? This has been discussed many times in the past already. If you archive the incremental "star send" data streams, you cannot extract single files andit seems that this cannot be fixed without introducing a different archive format. That assumes you are writing the 'zfs send' stream to a file or file like media. In many cases people using 'zfs send' for they backup strategy are they are writing it back out using 'zfs recv' into another pool. In those cases the files can even be restored over NFS/CIFS by using the .zfs/snapshot directory For example: http://hub.opensolaris.org/bin/download/User+Group+losug/w%2D2009/Open%2DBackup%2Dwith%2DNotes.pdf Star implements incremental backups and restores based on POSIX compliant archives. ZFS filesystem have functionality beyond POSIX and some of that is really very important for some people (especially those using CIFS) Does Star (or any other POSIX archiver) backup: ZFS ACLs ? ZFS system attributes (as used by the CIFS server and locally) ? ZFS dataset properties (compression, checksum etc) ? If it doesn't then it is providing an "archive" of the data in the filesystem, not a full/incremental copy of the ZFS dataset. Which depending on the requirements of the backup may not be enough. In otherwords you have data/metadata missing from your backup. The only tool I'm aware of today that provides a copy of the data, and all of the ZPL metadata and all the ZFS dataset properties is 'zfs send'. Just like (s)tar alone is not an enterprise backup tool, neither is 'zfs send'. Both of them need some scripting and infrastructure mangement around them to make a backup solution suitable for a given deployment. In some deployments maybe the correct answer is both. Each have their place (s)tar is a file/directory archiver, 'zfs send' on the other hand is a ZFS dataset (not just ZPL fileystems since it works on ZVOLs and all future dataset types too) replication tool that happens to write out a "stream". -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies
On 18/03/2010 12:54, joerg.schill...@fokus.fraunhofer.de wrote: It has been widely discussed here already that the output of zfs send cannot be used as a backup. First define exactly what you mean by "backup". Please don't confuse "backup" and "archival" they aren't the same thing. It would also help if the storage medium for the "backup" is defined and what the required access to it is - eg: full restore only, incremental restores, per file restore. The format is now committed and versioned. It is the only format that saves all of the information about a ZFS dataset (including its dataset properties) not just the data files, ACLs, extended attributes and system attributes. The steam itself even supports deduplication of the data blocks with in it. So exactly what makes it unsuitable for backup ? Is it the file format or the way the utility works ? If it is the format what is wrong with it ? If it is the utility what is needed to fix that ? -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS/OSOL/Firewire...
On Thu, March 18, 2010 04:50, erik.ableson wrote: > > It would appear that the bus bandwidth is limited to about 10MB/sec > (~80Mbps) which is well below the theoretical 400Mbps that 1394 is > supposed to be able to handle. I know that these two disks can go > significantly higher since I was seeing 30MB/sec when they were used on > Macs previously in the same daisy-chain configuration. > > I get the same symptoms on both the 2009.06 and the b129 machines. While it wasn't on Solaris, I must say that I've been consistently disappointed by the performance of external 1394 drives on various Linux boxes. I invested in the interface cards for the boxes, and in the external drives that supported Firewire, because everything said it performed much better for disk IO, but in fact I have never found it to be the case. Sort-of-glad to hear I don't have to wonder if I should be trying it on Solaris. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Is this a sensible spec for an iSCSI storage box?
It is hard, as you note, to recommend a box without knowing the load. How many linux boxes are you talking about? I think having a lot of space for your L2ARC is a great idea. Will you mirror your SLOG, or load balance them? I ask because perhaps one will be enough, IO wise. My box has one SLOG (X25-E) and can support about 2600 IOPS using an iometer profile that closely approximates my work load. My ~100 VMs on 8 ESX boxes average around 1000 IOPS, but can peak 2-3x that during backups. Don't discount NFS. I absolutely love NFS for management and thin provisioning reasons. Much easier (to me) than managing iSCSI, and performance is similar. I highly recommend load testing both iSCSI and NFS before you go live. Crash consistent backups of your VMs are possible using NFS, and recovering a VM from a snapshot is a little easier using NFS, I find. Why not larger capacity disks? Hopefully your switches support NIC aggregation? The only issue I have had on 2009.06 using iSCSI (I had a windows VM directly attaching to an iSCSI 4T volume) was solved and back ported to 2009.06 (bug 6794994). -Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies
A system with 100TB of data its 80% full and the a user ask their local system admin to restore a directory with large files, as it was 30days ago with all Windows/CIFS ACLS and NFSv4/ACLS etc. If we used zfs send, we need to go back to a zfs send some 30days ago, and find 80TB of disk space to be able to restore it. zfs send/recv is great for copy zfs from one zfs file system to another file system even across servers. But their needs to be a tool: * To restore an individual file or a zvol (with all ACLs/properties) * That allows backup vendors (which place backups on tape or disk or CD or ..) build indexes of what is contain in the backup (e.g. filename, owner, size modification dates, type (dir/file/etc) ) *Stream output suitable for devices like tape drives. *Should be able to tell if the file is corrupted when being restored. *May support recovery of corrupt data blocks within the stream. *Preferable gnutar command-line compatible *That admins can use to backup and transfer a subset of files e.g user home directory (which is not a file system) to another server or on to CD to be sent to their new office location, or For backup vendors is the idea for them to use NDMP protocol to backup ZFS and all its properties/ACLs? Or is a new tool required to achieve the above?? Cheers -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies
On Wed, Mar 17, 2010 at 9:15 AM, Edward Ned Harvey wrote: >> I think what you're saying is: Why bother trying to backup with "zfs >> send" >> when the recommended practice, fully supportable, is to use other tools >> for >> backup, such as tar, star, Amanda, bacula, etc. Right? >> >> The answer to this is very simple. >> #1 ... >> #2 ... > > Oh, one more thing. "zfs send" is only discouraged if you plan to store the > data stream and do "zfs receive" at a later date. > > If instead, you are doing "zfs send | zfs receive" onto removable media, or > another server, where the data is immediately fed through "zfs receive" then > it's an entirely viable backup technique. Richard Elling made an interesting observation that suggests that storing a zfs send data stream on tape is a quite reasonable thing to do. Richard's background makes me trust his analysis of this much more than I trust the typical person that says that zfs send output is poison. http://opensolaris.org/jive/thread.jspa?messageID=465973&tstart=0#465861 I think that a similar argument could be made for storing the zfs send data streams on a zfs file system. However, it is not clear why you would do this instead of just zfs send | zfs receive. -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] lazy zfs destroy
On Thu, Mar 18, 2010 at 1:19 AM, Chris Paul wrote: > OK I have a very large zfs snapshot I want to destroy. When I do this, the > system nearly freezes during the zfs destroy. This is a Sun Fire X4600 with > 128GB of memory. Now this may be more of a function of the IO device, but > let's say I don't care that this zfs destroy finishes quickly. I actually > don't care, as long as it finishes before I run out of disk space. > > So a suggestion for room for growth for the zfs suite is the ability to > lazily destroy snapshots, such that the destroy goes to sleep if the cpu > idle time falls under a certain percentage. > What build of OpenSolaris are you using ? Is it nearly freezing during all the process or just at the end ? There was another thread where a similar issue was discusses a week ago. -- Giovanni ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies
joerg.schill...@fokus.fraunhofer.de (Joerg Schilling) wrote: > > This has been discussed many times in the past already. > > If you archive the incremental "star send" data streams, you cannot > extract single files andit seems that this cannot be fixed without > introducing a different archive format. Sorry for the typo: this should be "zfs send" Jörg -- EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin j...@cs.tu-berlin.de(uni) joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 18.03.2010 14:12, Joerg Schilling wrote: > Darren J Moffat wrote: > >> So exactly what makes it unsuitable for backup ? >> >> Is it the file format or the way the utility works ? >> >> If it is the format what is wrong with it ? >> >> If it is the utility what is needed to fix that ? > > This has been discussed many times in the past already. > > If you archive the incremental "star send" data streams, you cannot > extract single files andit seems that this cannot be fixed without > introducing a different archive format. > > Star implements incremental backups and restores based on POSIX compliant > archives. And how does your favourite tool handle zvols? //Svein - -- - +---+--- /"\ |Svein Skogen | sv...@d80.iso100.no \ / |Solberg Østli 9| PGP Key: 0xE5E76831 X|2020 Skedsmokorset | sv...@jernhuset.no / \ |Norway | PGP Key: 0xCE96CE13 | | sv...@stillbilde.net ascii | | PGP Key: 0x58CD33B6 ribbon |System Admin | svein-listm...@stillbilde.net Campaign|stillbilde.net | PGP Key: 0x22D494A4 +---+--- |msn messenger: | Mobile Phone: +47 907 03 575 |sv...@jernhuset.no | RIPE handle:SS16503-RIPE - +---+--- If you really are in a hurry, mail me at svein-mob...@stillbilde.net This mailbox goes directly to my cellphone and is checked even when I'm not in front of my computer. - Picture Gallery: https://gallery.stillbilde.net/v/svein/ - -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.12 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkuiJ40ACgkQSBMQn1jNM7ZShgCfaSXEz2/SjsKwZYIJ6TAFRBzF QkAAoJeH7tLHjgL5ECzHhAtlig+qtnat =Pt1K -END PGP SIGNATURE- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies
Darren J Moffat wrote: > So exactly what makes it unsuitable for backup ? > > Is it the file format or the way the utility works ? > > If it is the format what is wrong with it ? > > If it is the utility what is needed to fix that ? This has been discussed many times in the past already. If you archive the incremental "star send" data streams, you cannot extract single files andit seems that this cannot be fixed without introducing a different archive format. Star implements incremental backups and restores based on POSIX compliant archives. Jörg -- EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin j...@cs.tu-berlin.de(uni) joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies
Carsten Aulbert wrote: > In case of 'star' the blob coming out of it might also be useless if you > don't > have star (or other tools) around for deciphering it - very unlikely, but > still possible ;) I invite you to inform yourself about star and to test it yourself. Star's backups are completely based on POSIX standard archive formats. If you don't have star (which is not very probable as star is OpenSource), you may extract the incremental dumps from star using any standard POSIX compliant archiver. You just lose the information and ability to do incremental restores. Jörg -- EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin j...@cs.tu-berlin.de(uni) joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies
Hi all On Thursday 18 March 2010 13:54:52 Joerg Schilling wrote: > If you have no technical issues to discuss, please stop insulting > people/products. > > We are on OpenSolaris and we don't like this kind of discussions on the > mailing lists. Please act collaborative. > May I suggest this to both of you. > It has been widely discussed here already that the output of zfs send > cannot be used as a backup. Depends on the exact definition of backup, e.g. if I may take this from wikipedia: "In information technology, a backup or the process of backing up refers to making copies of data so that these additional copies may be used to restore the original after a data loss event." In this regard zfs send *could* be a tool for a backup provided you have the means of decrypting/deciphering the blob coming out of it. OTOH if I used zfs send to replicate data to another machine/location together with zfs receive and put a label "backup" onto the receiver this would also count as a backup from where you can restore everything and/or partially. In case of 'star' the blob coming out of it might also be useless if you don't have star (or other tools) around for deciphering it - very unlikely, but still possible ;) Of course your (plural!) definition of backup may vary, thus I would propose first to settle on this before exchanging blows... Cheers Carsten ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies
Edward Ned Harvey wrote: > > I invite erybody to join star development at: > > We know, you have an axe to grind. Don't insult some other product just > because it's not the one you personally work on. Yours is better in some > ways, and "zfs send" is better in some ways. If you have no technical issues to discuss, please stop insulting people/products. We are on OpenSolaris and we don't like this kind of discussions on the mailing lists. Please act collaborative. It has been widely discussed here already that the output of zfs send cannot be used as a backup. Jörg -- EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin j...@cs.tu-berlin.de(uni) joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedupratio riddle
On 18 mar 2010, at 18.38, Craig Alder wrote: I remembered reading a post about this a couple of months back. This post by Jeff Bonwick confirms that the dedupratio is calculated only on the data that you've attempted to deduplicate, i.e. only the data written whilst dedup is turned on - http://mail.opensolaris.org/pipermail/zfs-discuss/2009-December/034721.html . Ah, I was on the right track then with the DDT then :) guess most people have it turned on/off from the begining until BP rewrite to ensure everything is deduplicated(which is probably a good idea). Regards Henrik http://sparcv9.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies
> From what I've read so far, zfs send is a block level api and thus > cannot be > used for real backups. As a result of being block level oriented, the Weirdo. The above "cannot be used for real backups" is obviously subjective, is incorrect and widely discussed here, so I just say "weirdo." I'm tired of correcting this constantly. > I invite erybody to join star development at: We know, you have an axe to grind. Don't insult some other product just because it's not the one you personally work on. Yours is better in some ways, and "zfs send" is better in some ways. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies
> My own stuff is intended to be backed up by a short-cut combination -- > zfs send/receive to an external drive, which I then rotate off-site (I > have three of a suitable size). However, the only way that actually > works so far is to destroy the pool (not just the filesystem) and > recreate it from scratch, and then do a full replication stream. That > works most of the time, hangs about 1/5. Anything else I've tried is > much worse, with hangs approaching 100%. Interesting, that's precisely what we do at work, and it works 100% of the time. Solaris 10u8 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to manage scrub priority or defer scrub?
> On that > occasion: does anybody know if ZFS reads all parities > during a scrub? > > Yes > > > Wouldn't it be sufficient for stale corruption > detection to read only one parity set unless an error > occurs there? > > No, because the parity itself is not verified. Aha. Well, my understanding was that a scrub basically means reading all data, and compare with the parities, which means that these have to be re-computed. Is that correct? Regards, Tonmaus -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Performance on SATA Deive
Hi, Thanks for your reply BOTH are Sun Sparc T1000 machines. Hard disk 1 TB sata on both ZFS system Memory32 GB , Processor 1GH 6 core os Solaris 10 10/09 s10s_u8wos_08a SPARC PatchCluster level 142900-02(Dec 09 ) UFS machine Hard disk 1 TB sata Memory 16 GB Processor Processor 1GH 6 core Solaris 10 8/07 s10s_u4wos_12b SPARC -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to manage scrub priority or defer scrub?
On Mar 16, 2010, at 4:41 PM, Tonmaus wrote: >> Are you sure that you didn't also enable >> something which >> does consume lots of CPU such as enabling some sort >> of compression, >> sha256 checksums, or deduplication? > > None of them is active on that pool or in any existing file system. Maybe the > issue is particular to RAIDZ2, which is comparably recent. On that occasion: > does anybody know if ZFS reads all parities during a scrub? Yes > Wouldn't it be sufficient for stale corruption detection to read only one > parity set unless an error occurs there? No, because the parity itself is not verified. >> The main concern that one should have is I/O >> bandwidth rather than CPU >> consumption since "software" based RAID must handle >> the work using the >> system's CPU rather than expecting it to be done by >> some other CPU. >> There are more I/Os and (in the case of mirroring) >> more data >> transferred. > > What I am trying to say is that CPU may become the bottleneck for I/O in case > of parity-secured stripe sets. Mirrors and simple stripe sets have almost 0 > impact on CPU. So far at least my observations. Moreover, x86 processors not > optimized for that kind of work as much as i.e. an Areca controller with a > dedicated XOR chip is, in its targeted field. All x86 processors you care about do XOR at memory bandwidth speed. XOR is one of the simplest instructions to implement on a microprocessor. The need for a dedicated XOR chip for older "hardware RAID" systems is because they use very slow processors with low memory bandwidth. Cheap is as cheap does :-) However, the issue for raidz2 and above (including RAID-6) is that the second parity is a more computationally complex Reed-Solomon code, not a simple XOR. So there is more computing required and that would be reflected in the CPU usage. -- richard ZFS storage and performance consulting at http://www.RichardElling.com ZFS training on deduplication, NexentaStor, and NAS performance Atlanta, March 16-18, 2010 http://nexenta-atlanta.eventbrite.com Las Vegas, April 29-30, 2010 http://nexenta-vegas.eventbrite.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Performance on SATA Deive
On 18/03/10 08:36 PM, Kashif Mumtaz wrote: Hi, I did another test on both machine. And write performance on ZFS extraordinary slow. Which build are you running? On snv_134, 2x dual-core cpus @ 3GHz and 8Gb ram (my desktop), I see these results: $ time dd if=/dev/zero of=test.dbf bs=8k count=1048576 1048576+0 records in 1048576+0 records out real0m28.224s user0m0.490s sys 0m19.061s This is a dataset on a straight mirrored pool, using two SATA2 drives (320Gb Seagate). $ time dd if=test.dbf bs=8k of=/dev/null 1048576+0 records in 1048576+0 records out real0m5.749s user0m0.458s sys 0m5.260s James C. McPherson -- Senior Software Engineer, Solaris Sun Microsystems http://www.jmcp.homeunix.com/blog ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedupratio riddle
I remembered reading a post about this a couple of months back. This post by Jeff Bonwick confirms that the dedupratio is calculated only on the data that you've attempted to deduplicate, i.e. only the data written whilst dedup is turned on - http://mail.opensolaris.org/pipermail/zfs-discuss/2009-December/034721.html. Regards, Craig -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Performance on SATA Deive
Hi, I did another test on both machine. And write performance on ZFS extraordinary slow. I did the following test on both machines For write time dd if=/dev/zero of=test.dbf bs=8k count=1048576 For read time dd if=/testpool/test.dbf of=/dev/null bs=8k ZFS machine has 32GB memory UFS machine has 16GB memory UFS machine test ### time dd if=/dev/zero of=test.dbf bs=8k count=1048576 1048576+0 records in 1048576+0 records out real2m18.352s user0m5.080s sys 1m44.388s #iostat -xnmpz 10 r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.6 107.94.8 62668.4 0.0 6.70.1 61.9 1 83 c0t0d0 0.00.20.00.2 0.0 0.00.00.8 0 0 c0t0d0s5 0.6 107.74.8 62668.2 0.0 6.70.1 62.0 1 83 c0t0d0s7 For read # time dd if=test.dbf of=/dev/null bs=8k 1048576+0 records in 1048576+0 records out real1m21.285s user0m4.701s sys 1m15.322s For write it took 2.18 minutes and for read it took 1.21 minutes. ## ZFS machine test ## # time dd if=/dev/zero of=test.dbf bs=8k count=1048576 1048576+0 records in 1048576+0 records out real140m33.590s user0m5.182s sys 2m33.025s extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.08.20.0 1037.0 0.0 33.30.0 4062.3 0 100 c0t0d0 0.08.20.0 1037.0 0.0 33.30.0 4062.3 0 100 c0t0d0s0 - For read #time dd if=test.dbf of=/dev/null bs=8k 1048576+0 records in 1048576+0 records out real0m59.177s user0m4.471s sys 0m54.723s For write it took 140 minutes and for read 59 seconds(less then UFS) - In ZFS data was being write around 1037 kw/s while disk remain busy 100%. In UFS data was being written around 62668 kw/s while disk is busy at 83% Kindly help me how can I tune the writing performance on ZFS? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedupratio riddle
On 18 mrt 2010, at 10:07, Henrik Johansson wrote: > Hello, > > On 17 mar 2010, at 16.22, Paul van der Zwan wrote: > >> >> On 16 mrt 2010, at 19:48, valrh...@gmail.com wrote: >> >>> Someone correct me if I'm wrong, but it could just be a coincidence. That >>> is, perhaps the data that you copied happens to lead to a dedup ratio >>> relative to the data that's already on there. You could test this out by >>> copying a few gigabytes of data you know is unique (like maybe a DVD video >>> file or something), and that should change the dedup ratio. >> >> The first copy of that data was unique and even dedup is switched off for >> the entire pool so it seems a bug in the calculation of the >> dedupratio or it used a method that is giving unexpected results. > > I wonder if the dedup ratio is calculated by the contents of the DDT or by > all the data contents of the whole pool, i'we only looked at the ratio for > datasets which had dedup on for the whole lifetime. If the former, data added > when it's switched off will never alter the ratio (until rewritten when with > dedup on). The source should have the answer, but i'm on mail only for a few > weeks. > > It'a probably for the whole dataset, that makes the most sense, just a > thought. > It looks like the ratio only gets updated when dedup is switched on and freezes if you switch dedup off for the entire pool, like I did. I tried to have a look at the source but it was way too complex to figure it out in the time I had available so far. Best regards, Paul van der Zwan Sun Microsystems Nederland > Regards > > Henrik > http://sparcv9.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Is this a sensible spec for an iSCSI storgage box?
Ultimately this could have 3TB of data on it and it is difficult to estimate the volume of changed data. It would be nice to have a changes mirrored immediately but asynchronously so as not to impede the master. The second box is likely to have a lower spec with fewer spindles for cost reasons. Immediate failover taking second place to data preservation in the event of a failure of the master. I had looked at this: http://hub.opensolaris.org/bin/view/Project+avs/WebHome But it did seem overkill to me and doesn't that mean that a resilver on the master will be replicated on the slave even if not required? A zfs send/receive every 15 minutes might well have to do. Matt. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS/OSOL/Firewire...
An interesting thing I just noticed here testing out some Firewire drives with OpenSolaris. Setup : OpenSolaris 2009.06 and a dev version (snv_129) 2 500Gb Firewire 400 drives with integrated hubs for daisy-chaining (net: 4 devices on the chain) - one SATA bridge - one PATA bridge Created a zpool with both drives as simple vdevs Started a zfs send/recv to backup a local filesystem Watching zpool iostat I see that the total throughput maxes out at about 10MB/s. Thinking that one of the drives may be at fault, I stopped, destroyed the pool and created two separate pools from each drive. Restarting the send/recv to one disk and saw the same max throughput. Tried to the other and got the same thing. Then I started one send/recv to one disk, got the max right away, and started and send/recv to the second one and got about 4MB/second while the first operation dropped to about 6MB/second. It would appear that the bus bandwidth is limited to about 10MB/sec (~80Mbps) which is well below the theoretical 400Mbps that 1394 is supposed to be able to handle. I know that these two disks can go significantly higher since I was seeing 30MB/sec when they were used on Macs previously in the same daisy-chain configuration. I get the same symptoms on both the 2009.06 and the b129 machines. It's not a critical issue to me since these drives will eventually just be used for send/recv backups over a slow link, but it doesn't augur well for the day I need to restore data... Anyone else seen this behaviour with Firewire devices and OpenSolaris? Erik ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 18.03.2010 10:31, Joerg Schilling wrote: > Svein Skogen wrote: > >> Please, don't compare proper backup drives to that rotating head >> non-standard catastrophy... DDS was (in)famous for being a delayed-fuse >> tape-shredder. > > DDS was a WOM (write only memory) type device. It did not report write errors > and it had many read errors. Kind of like /dev/null. And about as useful in a restore situation. //Svein - -- - +---+--- /"\ |Svein Skogen | sv...@d80.iso100.no \ / |Solberg Østli 9| PGP Key: 0xE5E76831 X|2020 Skedsmokorset | sv...@jernhuset.no / \ |Norway | PGP Key: 0xCE96CE13 | | sv...@stillbilde.net ascii | | PGP Key: 0x58CD33B6 ribbon |System Admin | svein-listm...@stillbilde.net Campaign|stillbilde.net | PGP Key: 0x22D494A4 +---+--- |msn messenger: | Mobile Phone: +47 907 03 575 |sv...@jernhuset.no | RIPE handle:SS16503-RIPE - +---+--- If you really are in a hurry, mail me at svein-mob...@stillbilde.net This mailbox goes directly to my cellphone and is checked even when I'm not in front of my computer. - Picture Gallery: https://gallery.stillbilde.net/v/svein/ - -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.12 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkuh83MACgkQSBMQn1jNM7bvrwCfUoIwu+YO8tfvb/mfSW063Wst jK0AoIhFdKig2bZd3RSOyEgTPTN3YNng =Gf9R -END PGP SIGNATURE- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies
Svein Skogen wrote: > Please, don't compare proper backup drives to that rotating head > non-standard catastrophy... DDS was (in)famous for being a delayed-fuse > tape-shredder. DDS was a WOM (write only memory) type device. It did not report write errors and it had many read errors. Jörg -- EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin j...@cs.tu-berlin.de(uni) joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies
Damon Atkins wrote: > I vote for zfs needing a backup and restore command against a snapshot. > > backup command should output on stderr at least > Full_Filename SizeBytes Modification_Date_1970secSigned > so backup software can build indexes and stdout contains the data. This is something that does not belong to stderr but to a separate stream that only allows to build a database. Stderr is used for error messages and warnings. > The advantage of zfs providing the command is that as ZFS upgrades or new > features are added backup vendors do not need to re-test their code. Could > also mean that when encryption comes a long a property on pool could indicate > if it is OK to decrypt the filenames only as part of a backup. > > restore would work the same way except you would pass a filename or a > directory to restore etc. And backup software would send back the stream to > zfs restore command. > > The other alternative is for zfs to provide a standard API for backups like > Oracle does for RMAN. You need to decide what you like to get.. >From what I've read so far, zfs send is a block level api and thus cannot be used for real backups. As a result of being block level oriented, the interpretation of the data is done zfs and thus every new feature could be copied without changing the format. If you like to have a backup that is able to retrieve arbitrary single files, you need a backup api at file level. If you have such an api, you need to enhance the backup tool in many cases where the file metadata is enhanced in the filesystem. We need to discuss to find the best archive formats for ZFS(NTFS) ACLs and for extended file attributes. I invite erybody to join star development at: https://lists.berlios.de/mailman/listinfo/star-developers and http://mail.opensolaris.org/mailman/listinfo/star-discuss Jörg -- EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin j...@cs.tu-berlin.de(uni) joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedupratio riddle
Hello, On 17 mar 2010, at 16.22, Paul van der Zwan wrote: On 16 mrt 2010, at 19:48, valrh...@gmail.com wrote: Someone correct me if I'm wrong, but it could just be a coincidence. That is, perhaps the data that you copied happens to lead to a dedup ratio relative to the data that's already on there. You could test this out by copying a few gigabytes of data you know is unique (like maybe a DVD video file or something), and that should change the dedup ratio. The first copy of that data was unique and even dedup is switched off for the entire pool so it seems a bug in the calculation of the dedupratio or it used a method that is giving unexpected results. I wonder if the dedup ratio is calculated by the contents of the DDT or by all the data contents of the whole pool, i'we only looked at the ratio for datasets which had dedup on for the whole lifetime. If the former, data added when it's switched off will never alter the ratio (until rewritten when with dedup on). The source should have the answer, but i'm on mail only for a few weeks. It'a probably for the whole dataset, that makes the most sense, just a thought. Regards Henrik http://sparcv9.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss