Re: [zfs-discuss] Dedup... still in beta status
I had the same experience. Finally i could remove the dedup dataset (1,7 TB)... i was wrong... it wasnt 30 hours... it was "only" 21 (the reason of the mistake: first i tried to delete with nexentastor enterprises trial 3.02... but when i see that there was a new version of nexentastor comunity (3.03... with several zfs fixes)... i installed it... so the total time to delete a dataset: 21 hours... and the system of course... stalled (so... there is a looong time before dedup can be considerated as "stable"...perhaps dedup is stable but zfs+dedup NOT). -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedup... still in beta status
On Wed, Jun 16, 2010 at 3:39 AM, Fco Javier Garcia wrote: > The main problem is not performance (for a home server is not a problem)... > but what really is a BIG PROBLEM is when you try to delete a snapshot a > little big... (try yourself...create a big random file with 90 Gb of data... > then This is reportedly fixed in build past snv_134. I believe there was a single thread that reduced throughput dramatically. I was really excited to play with dedup and started using it around b131. Even with 8gb RAM and 30gb L2ARC, it took about a day to destroy some snapshots. The regular expiration by zfs-auto-snapshot would stall the system for a few hours. Writes to dedup volumes were painfully slow, around 10k/s. I suspect that my DDT was larger than my L2ARC - I had a lot of data with dedup enabled. I've since done a send to another system and back to re-dup everything, which has restored performance at a cost of twice the space. -B -- Brandon High : bh...@freaks.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedup... still in beta status
On Jun 16, 2010, at 9:02 AM, Carlos Varela wrote: Does the machine respond to ping? Yes If there is a gui does the mouse pointer move? There is no GUI (nexentastor) Does the keyboard numlock key respond at all ? Yes I just find it very hard to believe that such a situation could exist as I have done some *abusive* tests on a SunFire X4100 with Sun 6120 fibre arrays ( in HA config ) and I could not get it to become a warm brick like you describe. How many processors does your machine have ? Full data: Motherboard: Asus m2n68-CM Initial memory: 3 Gb DDR2 ECC Actual memory: 8 GB DDR2 800 CPU: Athlon X2 5200 HD: 2 Seagate 1 WD (1,5 TB each) Pools: 1 RAIDZ pool datasets: 5 (ftp: 30 GB, varios: 170 GB, multimedia: 1,7TB, segur: 80 Gb prueba: 50 Mb) ZFS ver: 22 The pool was created with EON-NAS 0.6 ... dedupe on, Similar situation but with Opensolaris b133. Can ping machine but its frozen about 24 hours. I was deleting 25GB of dedup data. If I move 1-2 GB of data then the machine stops responding for 1 hour but comes back after that. I have munin installed and the graphs stop updating during that time and you can not use ssh. I agree that memory seems to not be enough as I see a lot of 20kb reads before it stops responding (reading DDT entries I guess). Maybe dedup has to be redesigned for low memory machines (a batch process instead of inline ?) This is my home machine so I can wait but businesses would not be so happy if the machine becomes so unresponsive that you can not access your data. The unresponsiveness that people report deleting large dedup zfs objects is due to ARC memory pressure and long service times accessing other zfs objects while it is busy resolving the deleted object's dedup references. Set a max size the ARC can grow to, saving room for system services, get an SSD drive to act as an L2ARC, run a scrub first to prime the L2ARC (actually probably better to run something targetting just those datasets in question), then delete the dedup objects, smallest to largest. -Ross ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedup... still in beta status
On Jun 16, 2010, at 6:46 AM, Dennis Clarke wrote: > > I have been lurking in this thread for a while for various reasons and > only now does a thought cross my mind worth posting : Are you saying that > a reasonably fast computer with 8GB of memory is entirely non-responsive > due to a ZFS related function? > The problem is that ZFS ends up trying to do too much work in syncing context. Because of the way the ZFS transaction model works, it's important that the txg sync phase remain constant (ZFS currently shoots for 3 seconds, though this was recently changed to 1). If this phase takes too long, then all other ZFS work is blocked. If this happens to your root pool, this would have the appearances of a hard hang. These problems have generally been due to one of two root causes: 1. Destroying snapshots, where the deadlist must be processed entirely within one txg. 2. Freeing blocks in a deduped dataset, which requires updating the DDT in syncing context. Most of the pathological aspects of these problems have been fixed (or will soon be fixed) in the latest source: 6922161 zio_ddt_free is single threaded with performance impact 6938089 dedup-induced latency causes FC initiator logouts/FC port resets 6948890 snapshot deletion can induce pathologically long spa_sync() times 6948911 snapshot deletion can induce unsatisfiable allocations in txg sync 6949730 spurious arc_free() can significantly exacerbate 6948890 6957289 ARC metadata limit can have serious adverse effect on dedup performance 6958873 lack of accounting for DDT in dedup'd frees can oversubscribe txg 6960374 need auxiliary mechanism for adjustment of write throttle There are still some extreme cases that can result in long sync times when using dedup, but nothing pathological (i.e. 30 seconds, not 30 hours). Expect to see fixes for these remaining issues in the near future. - Eric -- Eric Schrock, Fishworkshttp://blogs.sun.com/eschrock ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedup... still in beta status
> > > > Does the machine respond to ping? > > Yes > > > > > If there is a gui does the mouse pointer move? > > > > There is no GUI (nexentastor) > > > Does the keyboard numlock key respond at all ? > > Yes > > > > > I just find it very hard to believe that such a > > situation could exist as I > > have done some *abusive* tests on a SunFire X4100 > > with Sun 6120 fibre > > arrays ( in HA config ) and I could not get it to > > become a warm brick like > > you describe. > > > > How many processors does your machine have ? > > Full data: > > Motherboard: Asus m2n68-CM > Initial memory: 3 Gb DDR2 ECC > Actual memory: 8 GB DDR2 800 > CPU: Athlon X2 5200 > HD: 2 Seagate 1 WD (1,5 TB each) > Pools: 1 RAIDZ pool > datasets: 5 (ftp: 30 GB, varios: 170 GB, multimedia: > 1,7TB, segur: 80 Gb prueba: 50 Mb) > ZFS ver: 22 > > The pool was created with EON-NAS 0.6 ... dedupe on, Similar situation but with Opensolaris b133. Can ping machine but its frozen about 24 hours. I was deleting 25GB of dedup data. If I move 1-2 GB of data then the machine stops responding for 1 hour but comes back after that. I have munin installed and the graphs stop updating during that time and you can not use ssh. I agree that memory seems to not be enough as I see a lot of 20kb reads before it stops responding (reading DDT entries I guess). Maybe dedup has to be redesigned for low memory machines (a batch process instead of inline ?) This is my home machine so I can wait but businesses would not be so happy if the machine becomes so unresponsive that you can not access your data. NAMESIZE ALLOC FREECAP DEDUP HEALTH ALTROOT data 5.44T 4.76T 691G87% 1.18x ONLINE - rpool 111G 11.3G 99.7G10% 1.00x ONLINE - DDT-sha256-zap-duplicate: 2390516 entries, size 503 on disk, 386 in core DDT-sha256-zap-unique: 13224217 entries, size 374 on disk, 190 in core DDT histogram (aggregated over all DDTs): bucket allocated referenced __ __ __ refcnt blocks LSIZE PSIZE DSIZE blocks LSIZE PSIZE DSIZE -- -- - - - -- - - - 112.6M 1.53T 1.49T 1.49T12.6M 1.53T 1.49T 1.49T 22.12M241G228G228G4.70M534G504G503G 4 161K 14.8G 12.2G 12.2G 727K 66.2G 54.4G 54.4G 86.05K419M293M294M56.1K 3.69G 2.49G 2.50G 16 603 9.72M 5.45M 5.59M12.4K198M111M114M 32 351 18.5M 14.5M 14.6M15.0K861M678M680M 64 66 1.90M734K750K5.60K169M 64.0M 65.4M 128 25 1.51M616K622K4.02K224M 80.1M 80.9M 2563 1.50K 1.50K 2.24K 912456K456K682K 5124134K 6.50K 7.48K2.89K 81.2M 5.77M 6.47M 1K3129K 1.50K 2.24K4.19K160M 2.09M 3.13M 8K1128K 512 7669.22K 1.15G 4.61M 6.89M Total14.9M 1.78T 1.73T 1.73T18.1M 2.12T 2.04T 2.04T car...@quad:~$ ping 192.168.1.87 PING 192.168.1.87 (192.168.1.87) 56(84) bytes of data. 64 bytes from 192.168.1.87: icmp_seq=1 ttl=255 time=0.193 ms 64 bytes from 192.168.1.87: icmp_seq=2 ttl=255 time=0.187 ms 64 bytes from 192.168.1.87: icmp_seq=3 ttl=255 time=0.189 ms 64 bytes from 192.168.1.87: icmp_seq=4 ttl=255 time=0.160 ms 64 bytes from 192.168.1.87: icmp_seq=5 ttl=255 time=0.189 ms 64 bytes from 192.168.1.87: icmp_seq=6 ttl=255 time=0.184 ms 64 bytes from 192.168.1.87: icmp_seq=7 ttl=255 time=0.193 ms --- 192.168.1.87 ping statistics --- 7 packets transmitted, 7 received, 0% packet loss, time 5998ms rtt min/avg/max/mdev = 0.160/0.185/0.193/0.010 ms System Specs: Memory: 8GB DDR3 CPU: Core i7-860 2.8GHz (4 cores / 8 threads) HD: 4 x 1.5TB Seagate 7200.11 Raidz -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedup... still in beta status
> > Does the machine respond to ping? Yes > > If there is a gui does the mouse pointer move? > There is no GUI (nexentastor) > Does the keyboard numlock key respond at all ? Yes > > I just find it very hard to believe that such a > situation could exist as I > have done some *abusive* tests on a SunFire X4100 > with Sun 6120 fibre > arrays ( in HA config ) and I could not get it to > become a warm brick like > you describe. > > How many processors does your machine have ? Full data: Motherboard: Asus m2n68-CM Initial memory: 3 Gb DDR2 ECC Actual memory: 8 GB DDR2 800 CPU: Athlon X2 5200 HD: 2 Seagate 1 WD (1,5 TB each) Pools: 1 RAIDZ pool datasets: 5 (ftp: 30 GB, varios: 170 GB, multimedia: 1,7TB, segur: 80 Gb prueba: 50 Mb) ZFS ver: 22 The pool was created with EON-NAS 0.6 ... dedupe on, compression off Initial performance write CIFS:35MBs... final perfonmance (write): 10 MBs Then the pool was imported to their final OS: OSOL 134 (last development public version)... all was ok (time slider snapshots was only in "SEGUR" dataset and size was small... but then we delete some files (85 Gb of files video files of multimedia dataset)... i forgot that there was one snapshot... so we need to delete the snapshot (now 85 gb of size)... 1º. trying with Osol: start deleting... after a time, Cifs down... and finally system freezes 2º. Trying with EON...: after some hours it hangs (no enough memory) 3º. Trying with Nexenta core 3.0 RC: same as OSOL... start deleting... after a time freezes. Finally... i rollback the snapshot so... its size then was 0 bytes...so i coud delete it but then instead of delete files (10 by 10)... i destroy the dataset (1,7 Tb)...this was a mistake... system start deleting... but freezes... Then memory was added (actual 8 Gb), installed nexentastor 3.03 (Theoretically fixes several zfs bugs). Actual situation... when I try to import the pool into the machine... system freezes (in nexenta the freeze is automatic...) but internally is working (i can ping the system, keyboard is on... and can hear hard disk working..., but ssh is down, cif, apache, nfs... all is down). You can see more examples: http://www.nexentastor.org/boards/1/topics/440 P.S: 30 hours and the system still does not answer (but I have lots of patience) > > -- > Dennis Clarke > dcla...@opensolaris.ca <- Email related to the open > source Solaris > dcla...@blastwave.org <- Email related to open > source for Solaris > > > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discu > ss > -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedup... still in beta status
>> >> I think, with current bits, it's not a simple matter >> of "ok for >> enterprise, not ok for desktops". with an ssd for >> either main storage >> or l2arc, and/or enough memory, and/or a not very >> demanding workload, it >> seems to be ok. > > > The main problem is not performance (for a home server is not a > problem)... but what really is a BIG PROBLEM is when you try to delete a > snapshot a little big... (try yourself...create a big random file with 90 > Gb of data... then snapshot... then delete the file and delete the > snapshotyou will see)... and better... try removing the SSD disk. just > out of curiosity... my test sytem (8 Gb ram)... takes over 30 hours to > delete a dataset of 1.7 TB (still not finished...)... and the system does > not respond (is working but does not respond... not even a simple "ls" > command) > -- Hold on a sec. I have been lurking in this thread for a while for various reasons and only now does a thought cross my mind worth posting : Are you saying that a reasonably fast computer with 8GB of memory is entirely non-responsive due to a ZFS related function? Does the machine respond to ping? If there is a gui does the mouse pointer move? Does the keyboard numlock key respond at all ? I just find it very hard to believe that such a situation could exist as I have done some *abusive* tests on a SunFire X4100 with Sun 6120 fibre arrays ( in HA config ) and I could not get it to become a warm brick like you describe. How many processors does your machine have ? -- Dennis Clarke dcla...@opensolaris.ca <- Email related to the open source Solaris dcla...@blastwave.org <- Email related to open source for Solaris ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedup... still in beta status
> > I think, with current bits, it's not a simple matter > of "ok for > enterprise, not ok for desktops". with an ssd for > either main storage > or l2arc, and/or enough memory, and/or a not very > demanding workload, it > seems to be ok. The main problem is not performance (for a home server is not a problem)... but what really is a BIG PROBLEM is when you try to delete a snapshot a little big... (try yourself...create a big random file with 90 Gb of data... then snapshot... then delete the file and delete the snapshotyou will see)... and better... try removing the SSD disk. just out of curiosity... my test sytem (8 Gb ram)... takes over 30 hours to delete a dataset of 1.7 TB (still not finished...)... and the system does not respond (is working but does not respond... not even a simple "ls" command) -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedup... still in beta status
On 16/06/2010 11:30, Fco Javier Garcia wrote: This may also be accomplished by using snapshots and clones of data sets. At least for OS images: user profiles and documents could be something else entirely. Yes... but that will need a manager with access to zfs itself... but with dedupe you can use a userland manager (much more flexible) If you delegate snapshot/mount ZFS allow permissions to the user the management software runs as, then you can create/destroy/rename the snapshots over NFS/CIFS/FTP/SCP/HTTP by doing mkdir/rmdir/mv in the .zfs/snapshot directory for snapshots. Unfortunately there isn't away I know of to create clones using the .zfs directory. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedup... still in beta status
> This may also be accomplished by using snapshots and > clones of data > sets. At least for OS images: user profiles and > documents could be > something else entirely. Yes... but that will need a manager with access to zfs itself... but with dedupe you can use a userland manager (much more flexible) > > Another situation that comes to mind is perhaps as > the back-end to a > mail store: if you send out a message(s) with an > attachment(s) to a > lot of people, the attachment blocks could be deduped > (and perhaps > compressed as well, since base-64 adds 1/3 overhead). > > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discu > ss > -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedup... still in beta status
On Tue, Jun 15, 2010 at 7:28 PM, David Magda wrote: > On Jun 15, 2010, at 14:20, Fco Javier Garcia wrote: > >> I think dedup may have its greatest appeal in VDI environments (think >> about a environment with 85% if the data that the virtual machine needs is >> into ARC or L2ARC... is like a dream...almost instantaneous response... and >> you can boot a new machine in a few seconds)... > > This may also be accomplished by using snapshots and clones of data sets. At > least for OS images: user profiles and documents could be something else > entirely. It all depends on the nature of the VDI environment. If the VMs are regenerated on each login, the snapshot + clone mechanism is sufficient. Deduplication is not needed. However, if VMs have a long life and get periodic patches and other software updates, deduplication will be required if you want to remain at somewhat constant storage utilization. It probably makes a lot of sense to be sure that swap or page files are on a non-dedup dataset. Executables and shared libraries shouldn't be getting paged out to it and the likelihood that multiple VMs page the same thing to swap or a page file is very small. > Another situation that comes to mind is perhaps as the back-end to a mail > store: if you send out a message(s) with an attachment(s) to a lot of > people, the attachment blocks could be deduped (and perhaps compressed as > well, since base-64 adds 1/3 overhead). It all depends on how this is stored. If the attachments are stored like they were in 1990 as part of an mbox format, you will be very unlikely to get the proper block alignment. Even storing the message body (including headers) in the same file as the attachment may not align the attachments because the mail headers may be different (e.g. different recipients messages took different paths, some were forwarded, etc.). If the attachments are stored in separate files or a database format is used that stores attachments separate from the message (with matching database + zfs block size) things may work out favorably. However, a system that detaches messages and stores them separately may just as well store them in a file that matches the SHA256 hash, assuming that file doesn't already exist. If does exist, it can just increment a reference count. In other words, an intelligent mail system should already dedup. Or at least that is how I would have written it for the last decade or so... -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedup... still in beta status
On Jun 15, 2010, at 14:20, Fco Javier Garcia wrote: I think dedup may have its greatest appeal in VDI environments (think about a environment with 85% if the data that the virtual machine needs is into ARC or L2ARC... is like a dream...almost instantaneous response... and you can boot a new machine in a few seconds)... This may also be accomplished by using snapshots and clones of data sets. At least for OS images: user profiles and documents could be something else entirely. Another situation that comes to mind is perhaps as the back-end to a mail store: if you send out a message(s) with an attachment(s) to a lot of people, the attachment blocks could be deduped (and perhaps compressed as well, since base-64 adds 1/3 overhead). ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedup... still in beta status
On 06/15/10 10:52, Erik Trimble wrote: Frankly, dedup isn't practical for anything but enterprise-class machines. It's certainly not practical for desktops or anything remotely low-end. We're certainly learning a lot about how zfs dedup behaves in practice. I've enabled dedup on two desktops and a home server and so far haven't regretted it on those three systems. However, they each have more than typical amounts of memory (4G and up) a data pool in two or more large-capacity SATA drives, plus an X25-M ssd sliced into a root pool as well as l2arc and slog slices for the data pool (see below: [1]) I tried enabling dedup on a smaller system (with only 1G memory and a single very slow disk), observed serious performance problems, and turned it off pretty quickly. I think, with current bits, it's not a simple matter of "ok for enterprise, not ok for desktops". with an ssd for either main storage or l2arc, and/or enough memory, and/or a not very demanding workload, it seems to be ok. For one such system, I'm seeing: # zpool list z NAME SIZE ALLOC FREECAP DEDUP HEALTH ALTROOT z 464G 258G 206G55% 1.25x ONLINE - # zdb -D z DDT-sha256-zap-duplicate: 432759 entries, size 304 on disk, 156 in core DDT-sha256-zap-unique: 1094244 entries, size 298 on disk, 151 in core dedup = 1.25, compress = 1.44, copies = 1.00, dedup * compress / copies = 1.80 - Bill [1] To forestall responses of the form: "you're nuts for putting a slog on an x25-m", which is off-topic for this thread and being discussed elsewhere": Yes, I'm aware of the write cache issues on power fail on the x25-m. For my purposes, it's a better robustness/performance tradeoff than either zil-on-spinning-rust or zil disabled, because: a) for many potential failure cases on whitebox hardware running bleeding edge opensolaris bits, the x25-m will not lose power and thus the write cache will stay intact across a crash. b) even if it loses power and loses some writes-in-flight, it's not likely to lose *everything* since the last txg sync. It's good enough for my personal use. Your mileage will vary. As always, system design involves tradeoffs. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedup... still in beta status
On 6/15/2010 11:53 AM, Fco Javier Garcia wrote: or as a member of the ZFS team (which I'm not). Then you have to be brutally good with Java Thanks, but I do get it wrong every so often (hopefully, rarely). More importantly, I don't know anything about the internal goings-on of the ZFS team, so I have nothing extra to say about schedules, plans, timing, etc. that everyone else doesn't know. I can only speculate based on what's been publicly said on those topics. E.g. I wish I knew when certain bugs would be fixed, but I don't have any more visibility to that than the public. -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedup... still in beta status
On 6/15/2010 11:49 AM, Geoff Nordli wrote: From: Fco Javier Garcia Sent: Tuesday, June 15, 2010 11:21 AM Realistically, I think people are overtly-enamored with dedup as a feature - I would generally only consider it worth-while in cases where you get significant savings. And by significant, I'm talking an order of magnitude space savings. A 2x savings isn't really enough to counteract the down sides. Especially when even enterprise disk space is (relatively) cheap. I think dedup may have its greatest appeal in VDI environments (think about a environment with 85% if the data that the virtual machine needs is into ARC or L2ARC... is like a dream...almost instantaneous response... and you can boot a new machine in a few seconds)... Does dedup benefit in the ARC/L2ARC space? For some reason, I have it in my head that for each time it requests the block from storage it will copy it into cache; therefore if I had 10 VMs requesting the same dedup'd block, there will be 10 copies of the same block in ARC/L2ARC. Geoff No, that's not correct. It's the *same* block, regardless of where it was referenced from. The cached block has no idea where it was referenced from (that's in the metadata). So, even if I have 10 VMs, requesting access to 10 different files, if those files have been dedup-ed, then any "common" (i.e. deduped) blocks will be stored only once in the ARC/L2ARC. -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedup... still in beta status
or as a member of the ZFS team > (which I'm not). > Then you have to be brutally good with Java > -- > Erik Trimble > Java System Support > Mailstop: usca22-123 > Phone: x17195 > Santa Clara, CA > > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discu > ss > -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedup... still in beta status
>From: Fco Javier Garcia >Sent: Tuesday, June 15, 2010 11:21 AM > >> Realistically, I think people are overtly-enamored with dedup as a >> feature - I would generally only consider it worth-while in cases >> where you get significant savings. And by significant, I'm talking an >> order of magnitude space savings. A 2x savings isn't really enough to >> counteract the down sides. Especially when even enterprise disk space >> is >> (relatively) cheap. >> > > >I think dedup may have its greatest appeal in VDI environments (think about a >environment with 85% if the data that the virtual machine needs is into ARC or >L2ARC... is like a dream...almost instantaneous response... and you can boot a >new machine in a few seconds)... > Does dedup benefit in the ARC/L2ARC space? For some reason, I have it in my head that for each time it requests the block from storage it will copy it into cache; therefore if I had 10 VMs requesting the same dedup'd block, there will be 10 copies of the same block in ARC/L2ARC. Geoff ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedup... still in beta status
On 6/15/2010 10:52 AM, Erik Trimble wrote: Frankly, dedup isn't practical for anything but enterprise-class machines. It's certainly not practical for desktops or anything remotely low-end. This isn't just a ZFS issue - all implementations I've seen so far require enterprise-class solutions. Realistically, I think people are overtly-enamored with dedup as a feature - I would generally only consider it worth-while in cases where you get significant savings. And by significant, I'm talking an order of magnitude space savings. A 2x savings isn't really enough to counteract the down sides. Especially when even enterprise disk space is (relatively) cheap. That all said, ZFS dedup is still definitely beta. There are known severe bugs and performance issues which will take time to fix, as not all of them have obvious solutions. Given current schedules, I predict that it should be production-ready some time in 2011. *When* in 2011, I couldn't hazard... Maybe time to make Solaris 10 Update 12 or so? One thing here - I forgot to say, this is my opinion based on my observations/conversations on this list, and I in no way speak for Oracle officially, or as a member of the ZFS team (which I'm not). -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedup... still in beta status
> Realistically, I think people are overtly-enamored > with dedup as a > feature - I would generally only consider it > worth-while in cases where > you get significant savings. And by significant, I'm > talking an order of > magnitude space savings. A 2x savings isn't really > enough to counteract > the down sides. Especially when even enterprise disk > space is > (relatively) cheap. > I think dedup may have its greatest appeal in VDI environments (think about a environment with 85% if the data that the virtual machine needs is into ARC or L2ARC... is like a dream...almost instantaneous response... and you can boot a new machine in a few seconds)... > > That all said, ZFS dedup is still definitely beta. > There are known > severe bugs and performance issues which will take > time to fix, as not > all of them have obvious solutions Given current > schedules, I predict > that it should be production-ready some time in 2011. > *When* in 2011, I > couldn't hazard... > > Maybe time to make Solaris 10 Update 12 or so? Yes... so you can start paching Solaris on Monday... and perhaps... it will be finished on Tuesday (but next week) > > -- > Erik Trimble > Java System Support > Mailstop: usca22-123 > Phone: x17195 > Santa Clara, CA > > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discu > ss > -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedup... still in beta status
On 6/15/2010 9:03 AM, Fco Javier Garcia wrote: Data: 90% of current computers has less than 9 GB of RAM, less than 5% has SSD systems. Let use a computer storage "standard", with a capacity of 4 TB ... dedupe on, dataset with blocks of 32 kb ..., 2 TB of data in use ... need 16 GB of memory just only for DTT ... but this will not see it until it's too late ... ie, we will work with the system ... performance will be good ... Little by little we will see that write performance is dropping ... then we will see that the system crashes randomly (when deleting automatic snapshots) ... and finally will see that disabling dedup doesnt solve it. It may indicate that dedupe has some requirements ... that is true, but what is true too is that in systems with large amounts of RAM(for the usual parameters) usual operations as file deleting or datasets/snapshot destroying give us a decrease of performance ... even totally blocking system ... and that is not admissible ... so maybe it would be desirable to place dedupe in a freeze (beta or development situation) until we can get one stable version so we can make any necessary changes in the nucleus of zfs that allow its use without compromising the integrity of the entire system (p.ejm: Enabling the erasing of blocks in multithreading .) And what can we do if we have a system already "contaminated" with dedupe? ... 1st Disable snapshots 2. Create a new dataset without dedupe and copy the data to the new dataset. 3. After copying the data, delete the snapshots... first "the smaller", if there is some snapshot bigger (more than 10 Gb)... make progresive roollback to it (Thus the snapshot will use 0 bytes) and we can delete. 4. When there are no snapshots in the dataset ... remove slowly (in batches) all files. 5. Finally, when there are no files... destroy de dataset If we miss any of these steps (and directly try to delete a snapshot with 95 Gb) , the system will crash ... if we try to delete the dataset and the system crashes ... by restarting your computer will crash the system too (since the operation will continue trying to erase ) My test system: AMD Athlon X2 5400, 8 Gb RAM, RAIDZ 3 TB, dataset 1,7 Tb, snapshot: 87 Gb... tested with: OSOL 134, EON 0.6, Nexenta core 3.02, Nexentastor enterprise 3.02... all systems freezes when trying to delete snapshots... finally with rollback i could delete all snapshots... but when trying to destroy the dataset ... The system is still processing the order ... (after 20 hours ... ) Frankly, dedup isn't practical for anything but enterprise-class machines. It's certainly not practical for desktops or anything remotely low-end. This isn't just a ZFS issue - all implementations I've seen so far require enterprise-class solutions. Realistically, I think people are overtly-enamored with dedup as a feature - I would generally only consider it worth-while in cases where you get significant savings. And by significant, I'm talking an order of magnitude space savings. A 2x savings isn't really enough to counteract the down sides. Especially when even enterprise disk space is (relatively) cheap. That all said, ZFS dedup is still definitely beta. There are known severe bugs and performance issues which will take time to fix, as not all of them have obvious solutions. Given current schedules, I predict that it should be production-ready some time in 2011. *When* in 2011, I couldn't hazard... Maybe time to make Solaris 10 Update 12 or so? -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Dedup... still in beta status
Data: 90% of current computers has less than 9 GB of RAM, less than 5% has SSD systems. Let use a computer storage "standard", with a capacity of 4 TB ... dedupe on, dataset with blocks of 32 kb ..., 2 TB of data in use ... need 16 GB of memory just only for DTT ... but this will not see it until it's too late ... ie, we will work with the system ... performance will be good ... Little by little we will see that write performance is dropping ... then we will see that the system crashes randomly (when deleting automatic snapshots) ... and finally will see that disabling dedup doesnt solve it. It may indicate that dedupe has some requirements ... that is true, but what is true too is that in systems with large amounts of RAM(for the usual parameters) usual operations as file deleting or datasets/snapshot destroying give us a decrease of performance ... even totally blocking system ... and that is not admissible ... so maybe it would be desirable to place dedupe in a freeze (beta or development situation) until we can get one stable version so we can make any necessary changes in the nucleus of zfs that allow its use without compromising the integrity of the entire system (p.ejm: Enabling the erasing of blocks in multithreading .) And what can we do if we have a system already "contaminated" with dedupe? ... 1st Disable snapshots 2. Create a new dataset without dedupe and copy the data to the new dataset. 3. After copying the data, delete the snapshots... first "the smaller", if there is some snapshot bigger (more than 10 Gb)... make progresive roollback to it (Thus the snapshot will use 0 bytes) and we can delete. 4. When there are no snapshots in the dataset ... remove slowly (in batches) all files. 5. Finally, when there are no files... destroy de dataset If we miss any of these steps (and directly try to delete a snapshot with 95 Gb) , the system will crash ... if we try to delete the dataset and the system crashes ... by restarting your computer will crash the system too (since the operation will continue trying to erase ) My test system: AMD Athlon X2 5400, 8 Gb RAM, RAIDZ 3 TB, dataset 1,7 Tb, snapshot: 87 Gb... tested with: OSOL 134, EON 0.6, Nexenta core 3.02, Nexentastor enterprise 3.02... all systems freezes when trying to delete snapshots... finally with rollback i could delete all snapshots... but when trying to destroy the dataset ... The system is still processing the order ... (after 20 hours ... ) -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss