Re: [zfs-discuss] Dedup... still in beta status

2010-06-16 Thread Fco Javier Garcia
I had the same experience.

Finally i could remove the dedup dataset (1,7 TB)... i was wrong... it wasnt 30 
hours... it was "only" 21 (the reason of the mistake: first i tried to delete 
with nexentastor enterprises trial 3.02... but when i see that there was a new 
version of nexentastor comunity (3.03... with several zfs fixes)... i installed 
it... so the total time to delete a dataset: 21 hours... and the system of 
course... stalled (so... there is a looong time before dedup can be 
considerated as "stable"...perhaps dedup is stable but zfs+dedup NOT).
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Dedup... still in beta status

2010-06-16 Thread Brandon High
On Wed, Jun 16, 2010 at 3:39 AM, Fco Javier Garcia  wrote:
> The main problem is not performance (for a home server is not a problem)... 
> but what really is a BIG PROBLEM is when you try to delete a snapshot a 
> little big... (try yourself...create a big random file with 90 Gb of data... 
> then

This is reportedly fixed in build past snv_134. I believe there was a
single thread that reduced throughput dramatically.

I was really excited to play with dedup and started using it around
b131. Even with 8gb RAM and 30gb L2ARC, it took about a day to destroy
some snapshots. The regular expiration by zfs-auto-snapshot would
stall the system for a few hours. Writes to dedup volumes were
painfully slow, around 10k/s. I suspect that my DDT was larger than my
L2ARC - I had a lot of data with dedup enabled.

I've since done a send to another system and back to re-dup
everything, which has restored performance at a cost of twice the
space.

-B

-- 
Brandon High : bh...@freaks.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Dedup... still in beta status

2010-06-16 Thread Ross Walker
On Jun 16, 2010, at 9:02 AM, Carlos Varela   
wrote:




Does the machine respond to ping?


Yes



If there is a gui does the mouse pointer move?



There is no GUI (nexentastor)


Does the keyboard numlock key respond at all ?


Yes



I just find it very hard to believe that such a
situation could exist as I
have done some *abusive* tests on a SunFire X4100
with Sun 6120 fibre
arrays ( in HA config ) and I could not get it to
become a warm brick like
you describe.

How many processors does your machine have ?


Full data:

Motherboard: Asus m2n68-CM
Initial memory: 3 Gb DDR2 ECC
Actual memory: 8 GB DDR2 800
CPU: Athlon X2 5200
HD: 2 Seagate 1 WD (1,5 TB each)
Pools: 1 RAIDZ pool
datasets: 5 (ftp: 30 GB, varios: 170 GB, multimedia:
1,7TB, segur: 80 Gb prueba: 50 Mb)
ZFS ver: 22

The pool was created with EON-NAS 0.6 ... dedupe on,


Similar situation but with Opensolaris b133. Can ping machine but  
its frozen about 24 hours. I was deleting 25GB of dedup data. If I  
move 1-2 GB of data then the machine stops responding for 1 hour but  
comes back after that. I have munin installed and the graphs stop  
updating during that time and you can not use ssh. I agree that  
memory seems to not be enough as I see a lot of 20kb reads before it  
stops responding (reading DDT entries I guess). Maybe dedup has to  
be redesigned for low memory machines (a batch process instead of  
inline ?)
This is my home machine so I can wait but businesses would not be so  
happy if the machine becomes so unresponsive that you can not access  
your data.


The unresponsiveness that people report deleting large dedup zfs  
objects is due to ARC memory pressure and long service times accessing  
other zfs objects while it is busy resolving the deleted object's  
dedup references.


Set a max size the ARC can grow to, saving room for system services,  
get an SSD drive to act as an L2ARC, run a scrub first to prime the  
L2ARC (actually probably better to run something targetting just those  
datasets in question), then delete the dedup objects, smallest to  
largest.


-Ross

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Dedup... still in beta status

2010-06-16 Thread Eric Schrock

On Jun 16, 2010, at 6:46 AM, Dennis Clarke wrote:
> 
> I have been lurking in this thread for a while for various reasons and
> only now does a thought cross my mind worth posting : Are you saying that
> a reasonably fast computer with 8GB of memory is entirely non-responsive
> due to a ZFS related function?
> 

The problem is that ZFS ends up trying to do too much work in syncing context.  
Because of the way the ZFS transaction model works, it's important that the txg 
sync phase remain constant (ZFS currently shoots for 3 seconds, though this was 
recently changed to 1).  If this phase takes too long, then all other ZFS work 
is blocked.  If this happens to your root pool, this would have the appearances 
of a hard hang.

These problems have generally been due to one of two root causes:

1. Destroying snapshots, where the deadlist must be processed entirely within 
one txg.

2. Freeing blocks in a deduped dataset, which requires updating the DDT in 
syncing context.

Most of the pathological aspects of these problems have been fixed (or will 
soon be fixed) in the latest source:

6922161 zio_ddt_free is single threaded with performance impact
6938089 dedup-induced latency causes FC initiator logouts/FC port resets
6948890 snapshot deletion can induce pathologically long spa_sync() times
6948911 snapshot deletion can induce unsatisfiable allocations in txg sync
6949730 spurious arc_free() can significantly exacerbate 6948890
6957289 ARC metadata limit can have serious adverse effect on dedup performance
6958873 lack of accounting for DDT in dedup'd frees can oversubscribe txg
6960374 need auxiliary mechanism for adjustment of write throttle

There are still some extreme cases that can result in long sync times when 
using dedup, but nothing pathological (i.e. 30 seconds, not 30 hours).  Expect 
to see fixes for these remaining issues in the near future.

- Eric

--
Eric Schrock, Fishworkshttp://blogs.sun.com/eschrock

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Dedup... still in beta status

2010-06-16 Thread Carlos Varela
> > 
> > Does the machine respond to ping?
> 
> Yes
> 
> > 
> > If there is a gui does the mouse pointer move?
> > 
> 
> There is no GUI (nexentastor)
> 
> > Does the keyboard numlock key respond at all ?
> 
> Yes
> 
> > 
> > I just find it very hard to believe that such a
> > situation could exist as I
> > have done some *abusive* tests on a SunFire X4100
> > with Sun 6120 fibre
> > arrays ( in HA config ) and I could not get it to
> > become a warm brick like
> > you describe.
> > 
> > How many processors does your machine have ?
> 
> Full data:
> 
> Motherboard: Asus m2n68-CM
> Initial memory: 3 Gb DDR2 ECC
> Actual memory: 8 GB DDR2 800
> CPU: Athlon X2 5200
> HD: 2 Seagate 1 WD (1,5 TB each)
> Pools: 1 RAIDZ pool
> datasets: 5 (ftp: 30 GB, varios: 170 GB, multimedia:
> 1,7TB, segur: 80 Gb prueba: 50 Mb)
> ZFS ver: 22
> 
> The pool was created with EON-NAS 0.6 ... dedupe on,

Similar situation but with Opensolaris b133. Can ping machine but its frozen 
about 24 hours. I was deleting 25GB of dedup data. If I move 1-2 GB of data 
then the machine stops responding for 1 hour but comes back after that. I have 
munin installed and the graphs stop updating during that time and you can not 
use ssh. I agree that memory seems to not be enough as I see a lot of 20kb 
reads before it stops responding (reading DDT entries I guess). Maybe dedup has 
to be redesigned for low memory machines (a batch process instead of inline ?)
This is my home machine so I can wait but businesses would not be so happy if 
the machine becomes so unresponsive that you can not access your data.

NAMESIZE  ALLOC   FREECAP  DEDUP  HEALTH  ALTROOT   
data   5.44T  4.76T   691G87%  1.18x  ONLINE  - 
rpool   111G  11.3G  99.7G10%  1.00x  ONLINE  - 

DDT-sha256-zap-duplicate: 2390516 entries, size 503 on disk, 386 in core
DDT-sha256-zap-unique: 13224217 entries, size 374 on disk, 190 in core  

DDT histogram (aggregated over all DDTs):   

bucket  allocated   referenced  
__   __   __
refcnt   blocks   LSIZE   PSIZE   DSIZE   blocks   LSIZE   PSIZE   DSIZE
--   --   -   -   -   --   -   -   -
 112.6M   1.53T   1.49T   1.49T12.6M   1.53T   1.49T   1.49T
 22.12M241G228G228G4.70M534G504G503G
 4 161K   14.8G   12.2G   12.2G 727K   66.2G   54.4G   54.4G
 86.05K419M293M294M56.1K   3.69G   2.49G   2.50G
16  603   9.72M   5.45M   5.59M12.4K198M111M114M
32  351   18.5M   14.5M   14.6M15.0K861M678M680M
64   66   1.90M734K750K5.60K169M   64.0M   65.4M
   128   25   1.51M616K622K4.02K224M   80.1M   80.9M
   2563   1.50K   1.50K   2.24K  912456K456K682K
   5124134K   6.50K   7.48K2.89K   81.2M   5.77M   6.47M
1K3129K   1.50K   2.24K4.19K160M   2.09M   3.13M
8K1128K 512 7669.22K   1.15G   4.61M   6.89M
 Total14.9M   1.78T   1.73T   1.73T18.1M   2.12T   2.04T   2.04T

car...@quad:~$ ping 192.168.1.87
PING 192.168.1.87 (192.168.1.87) 56(84) bytes of data.  
64 bytes from 192.168.1.87: icmp_seq=1 ttl=255 time=0.193 ms
64 bytes from 192.168.1.87: icmp_seq=2 ttl=255 time=0.187 ms
64 bytes from 192.168.1.87: icmp_seq=3 ttl=255 time=0.189 ms
64 bytes from 192.168.1.87: icmp_seq=4 ttl=255 time=0.160 ms
64 bytes from 192.168.1.87: icmp_seq=5 ttl=255 time=0.189 ms
64 bytes from 192.168.1.87: icmp_seq=6 ttl=255 time=0.184 ms
64 bytes from 192.168.1.87: icmp_seq=7 ttl=255 time=0.193 ms

--- 192.168.1.87 ping statistics ---
7 packets transmitted, 7 received, 0% packet loss, time 5998ms  
rtt min/avg/max/mdev = 0.160/0.185/0.193/0.010 ms   

System Specs:

Memory: 8GB DDR3
CPU: Core i7-860 2.8GHz (4 cores / 8 threads)
HD: 4 x 1.5TB Seagate 7200.11 Raidz
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Dedup... still in beta status

2010-06-16 Thread Fco Javier Garcia
> 
> Does the machine respond to ping?

Yes

> 
> If there is a gui does the mouse pointer move?
> 

There is no GUI (nexentastor)

> Does the keyboard numlock key respond at all ?

Yes

> 
> I just find it very hard to believe that such a
> situation could exist as I
> have done some *abusive* tests on a SunFire X4100
> with Sun 6120 fibre
> arrays ( in HA config ) and I could not get it to
> become a warm brick like
> you describe.
> 
> How many processors does your machine have ?

Full data:

Motherboard: Asus m2n68-CM
Initial memory: 3 Gb DDR2 ECC
Actual memory: 8 GB DDR2 800
CPU: Athlon X2 5200
HD: 2 Seagate 1 WD (1,5 TB each)
Pools: 1 RAIDZ pool
datasets: 5 (ftp: 30 GB, varios: 170 GB, multimedia: 1,7TB, segur: 80 Gb 
prueba: 50 Mb)
ZFS ver: 22

The pool was created with EON-NAS 0.6 ... dedupe on, compression off

Initial performance write CIFS:35MBs... final perfonmance (write): 10 MBs 
Then the pool was imported to their final OS: OSOL 134 (last development public 
version)... all was ok (time slider snapshots was  only in "SEGUR" dataset and 
size was small... but then we delete some files (85 Gb of files video files 
of multimedia dataset)... i forgot that there was one snapshot... so we need to 
delete the snapshot (now 85 gb of size)...

1º. trying with Osol: start deleting... after a time, Cifs down... and finally 
system freezes 
2º. Trying with EON...: after some hours it hangs (no enough memory)
3º. Trying with Nexenta core 3.0 RC: same as OSOL... start deleting... after a 
time freezes.

Finally... i rollback the snapshot so... its size then was 0 bytes...so i coud 
delete it  but then instead of delete files  (10 by 10)... i destroy the 
dataset (1,7 Tb)...this was a mistake... system start deleting... but freezes...
Then memory was added (actual 8 Gb), installed nexentastor 3.03 (Theoretically  
fixes several zfs bugs).

Actual situation... when I try to import the pool into the machine... system 
freezes (in nexenta the freeze is automatic...) but internally is working (i 
can ping the system, keyboard is on... and can hear hard disk working..., but 
ssh is down, cif, apache, nfs... all is down).

You can see more examples: 
http://www.nexentastor.org/boards/1/topics/440

P.S: 30 hours and the system still does not answer (but I have lots of patience)











> 
> -- 
> Dennis Clarke
> dcla...@opensolaris.ca  <- Email related to the open
> source Solaris
> dcla...@blastwave.org   <- Email related to open
> source for Solaris
> 
> 
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discu
> ss
>
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Dedup... still in beta status

2010-06-16 Thread Dennis Clarke

>>
>> I think, with current bits, it's not a simple matter
>> of "ok for
>> enterprise, not ok for desktops".  with an ssd for
>> either main storage
>> or l2arc, and/or enough memory, and/or a not very
>> demanding workload, it
>> seems to be ok.
>
>
> The main problem is not performance (for a home server is not a
> problem)... but what really is a BIG PROBLEM is when you try to delete a
> snapshot a little big... (try yourself...create a big random file with 90
> Gb of data... then snapshot... then delete the file and delete the
> snapshotyou will see)... and better... try removing the SSD disk. just
> out of curiosity... my test sytem (8 Gb ram)... takes over 30 hours to
> delete a dataset of 1.7 TB (still not finished...)... and the system does
> not respond (is working but does not respond... not even a simple "ls"
> command)
> --

Hold on a sec.

I have been lurking in this thread for a while for various reasons and
only now does a thought cross my mind worth posting : Are you saying that
a reasonably fast computer with 8GB of memory is entirely non-responsive
due to a ZFS related function?

Does the machine respond to ping?

If there is a gui does the mouse pointer move?

Does the keyboard numlock key respond at all ?

I just find it very hard to believe that such a situation could exist as I
have done some *abusive* tests on a SunFire X4100 with Sun 6120 fibre
arrays ( in HA config ) and I could not get it to become a warm brick like
you describe.

How many processors does your machine have ?

-- 
Dennis Clarke
dcla...@opensolaris.ca  <- Email related to the open source Solaris
dcla...@blastwave.org   <- Email related to open source for Solaris


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Dedup... still in beta status

2010-06-16 Thread Fco Javier Garcia
> 
> I think, with current bits, it's not a simple matter
> of "ok for 
> enterprise, not ok for desktops".  with an ssd for
> either main storage 
> or l2arc, and/or enough memory, and/or a not very
> demanding workload, it 
> seems to be ok.


The main problem is not performance (for a home server is not a problem)... but 
what really is a BIG PROBLEM is when you try to delete a snapshot a little 
big... (try yourself...create a big random file with 90 Gb of data... then 
snapshot... then delete the file and delete the snapshotyou will see)... 
and better... try removing the SSD disk. just out of curiosity... my test sytem 
(8 Gb ram)... takes over 30 hours to delete a dataset of 1.7 TB (still not 
finished...)... and the system does not respond (is working but does not 
respond... not even a simple "ls" command)
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Dedup... still in beta status

2010-06-16 Thread Darren J Moffat

On 16/06/2010 11:30, Fco Javier Garcia wrote:

This may also be accomplished by using snapshots and
clones of data
sets. At least for OS images: user profiles and
documents could be
something else entirely.


Yes... but that will need a manager with access to zfs itself... but with 
dedupe you can use a userland manager (much more flexible)


If you delegate snapshot/mount ZFS allow permissions to the user the 
management software runs as, then you can create/destroy/rename the 
snapshots over NFS/CIFS/FTP/SCP/HTTP by doing mkdir/rmdir/mv in the 
.zfs/snapshot directory for snapshots.


Unfortunately there isn't away I know of to create clones using the .zfs 
directory.


--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Dedup... still in beta status

2010-06-16 Thread Fco Javier Garcia
> This may also be accomplished by using snapshots and
> clones of data  
> sets. At least for OS images: user profiles and
> documents could be  
> something else entirely.

Yes... but that will need a manager with access to zfs itself... but with 
dedupe you can use a userland manager (much more flexible)


> 
> Another situation that comes to mind is perhaps as
> the back-end to a  
> mail store: if you send out a message(s) with an
> attachment(s) to a  
> lot of people, the attachment blocks could be deduped
> (and perhaps  
> compressed as well, since base-64 adds 1/3 overhead).
> 
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discu
> ss
>
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Dedup... still in beta status

2010-06-15 Thread Mike Gerdts
On Tue, Jun 15, 2010 at 7:28 PM, David Magda  wrote:
> On Jun 15, 2010, at 14:20, Fco Javier Garcia wrote:
>
>> I think dedup may have its greatest appeal in VDI environments (think
>> about a environment with 85% if the data that the virtual machine needs is
>> into ARC or L2ARC... is like a dream...almost instantaneous response... and
>> you can boot a new machine in a few seconds)...
>
> This may also be accomplished by using snapshots and clones of data sets. At
> least for OS images: user profiles and documents could be something else
> entirely.

It all depends on the nature of the VDI environment.  If the VMs are
regenerated on each login, the snapshot + clone mechanism is
sufficient.  Deduplication is not needed.  However, if VMs have a long
life and get periodic patches and other software updates,
deduplication will be required if you want to remain at somewhat
constant storage utilization.

It probably makes a lot of sense to be sure that swap or page files
are on a non-dedup dataset.  Executables and shared libraries
shouldn't be getting paged out to it and the likelihood that multiple
VMs page the same thing to swap or a page file is very small.

> Another situation that comes to mind is perhaps as the back-end to a mail
> store: if you send out a message(s) with an attachment(s) to a lot of
> people, the attachment blocks could be deduped (and perhaps compressed as
> well, since base-64 adds 1/3 overhead).

It all depends on how this is stored.  If the attachments are stored
like they were in 1990 as part of an mbox format, you will be very
unlikely to get the proper block alignment.  Even storing the message
body (including headers) in the same file as the attachment may not
align the attachments because the mail headers may be different (e.g.
different recipients messages took different paths, some were
forwarded, etc.).  If the attachments are stored in separate files or
a database format is used that stores attachments separate from the
message (with matching database + zfs block size) things may work out
favorably.

However, a system that detaches messages and stores them separately
may just as well store them in a file that matches the SHA256 hash,
assuming that file doesn't already exist.  If does exist, it can just
increment a reference count.  In other words, an intelligent mail
system should already dedup.  Or at least that is how I would have
written it for the last decade or so...

-- 
Mike Gerdts
http://mgerdts.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Dedup... still in beta status

2010-06-15 Thread David Magda

On Jun 15, 2010, at 14:20, Fco Javier Garcia wrote:

I think dedup may have its greatest appeal in VDI environments  
(think about a environment with 85% if the data that the virtual  
machine needs is into ARC or L2ARC... is like a dream...almost  
instantaneous response... and you can boot a new machine in a few  
seconds)...


This may also be accomplished by using snapshots and clones of data  
sets. At least for OS images: user profiles and documents could be  
something else entirely.


Another situation that comes to mind is perhaps as the back-end to a  
mail store: if you send out a message(s) with an attachment(s) to a  
lot of people, the attachment blocks could be deduped (and perhaps  
compressed as well, since base-64 adds 1/3 overhead).


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Dedup... still in beta status

2010-06-15 Thread Bill Sommerfeld

On 06/15/10 10:52, Erik Trimble wrote:

Frankly, dedup isn't practical for anything but enterprise-class
machines. It's certainly not practical for desktops or anything remotely
low-end.


We're certainly learning a lot about how zfs dedup behaves in practice. 
 I've enabled dedup on two desktops and a home server and so far 
haven't regretted it on those three systems.


However, they each have more than typical amounts of memory (4G and up) 
a data pool in two or more large-capacity SATA drives, plus an X25-M ssd 
sliced into a root pool as well as l2arc and slog slices for the data 
pool (see below: [1])


I tried enabling dedup on a smaller system (with only 1G memory and a 
single very slow disk), observed serious performance problems, and 
turned it off pretty quickly.


I think, with current bits, it's not a simple matter of "ok for 
enterprise, not ok for desktops".  with an ssd for either main storage 
or l2arc, and/or enough memory, and/or a not very demanding workload, it 
seems to be ok.


For one such system, I'm seeing:

# zpool list z
NAME   SIZE  ALLOC   FREECAP  DEDUP  HEALTH  ALTROOT
z  464G   258G   206G55%  1.25x  ONLINE  -
# zdb -D z
DDT-sha256-zap-duplicate: 432759 entries, size 304 on disk, 156 in core
DDT-sha256-zap-unique: 1094244 entries, size 298 on disk, 151 in core

dedup = 1.25, compress = 1.44, copies = 1.00, dedup * compress / copies 
= 1.80

- Bill

[1] To forestall responses of the form: "you're nuts for putting a slog 
on an x25-m", which is off-topic for this thread and being discussed 
elsewhere":


Yes, I'm aware of the write cache issues on power fail on the x25-m. 
For my purposes, it's a better robustness/performance tradeoff than 
either zil-on-spinning-rust or zil disabled, because:
 a) for many potential failure cases on whitebox hardware running 
bleeding edge opensolaris bits, the x25-m will not lose power and thus 
the write cache will stay intact across a crash.
 b) even if it loses power and loses some writes-in-flight, it's not 
likely to lose *everything* since the last txg sync.


It's good enough for my personal use.  Your mileage will vary.  As 
always, system design involves tradeoffs.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Dedup... still in beta status

2010-06-15 Thread Erik Trimble

On 6/15/2010 11:53 AM, Fco Javier Garcia wrote:

or as a member of the ZFS team
   

(which I'm not).

 

Then you have to be brutally good with Java
   


Thanks, but I do get it wrong every so often (hopefully, rarely).  More 
importantly, I don't know anything about the internal goings-on of the 
ZFS team, so I have nothing extra to say about schedules, plans, timing, 
etc. that everyone else doesn't know. I can only speculate based on 
what's been publicly said on those topics.  E.g. I wish I knew when 
certain bugs would be fixed, but I don't have any more visibility to 
that than the public.



--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Dedup... still in beta status

2010-06-15 Thread Erik Trimble

On 6/15/2010 11:49 AM, Geoff Nordli wrote:

From: Fco Javier Garcia
Sent: Tuesday, June 15, 2010 11:21 AM

 

Realistically, I think people are overtly-enamored with dedup as a
feature - I would generally only consider it worth-while in cases
where you get significant savings. And by significant, I'm talking an
order of magnitude space savings.  A 2x savings isn't really enough to
counteract the down sides.  Especially when even enterprise disk space
is
(relatively) cheap.

   


I think dedup may have its greatest appeal in VDI environments (think about
 

a
   

environment with 85% if the data that the virtual machine needs is into ARC
 

or
   

L2ARC... is like a dream...almost instantaneous response... and you can
 

boot a
   

new machine in a few seconds)...

 

Does dedup benefit in the ARC/L2ARC space?

For some reason, I have it in my head that for each time it requests the
block from storage it will copy it into cache; therefore if I had 10 VMs
requesting the same dedup'd block, there will be 10 copies of the same block
in ARC/L2ARC.

Geoff
   


No, that's not correct. It's the *same* block, regardless of where it 
was referenced from. The cached block has no idea where it was 
referenced from (that's in the metadata).  So, even if I have 10 VMs, 
requesting access to 10 different files, if those files have been 
dedup-ed, then any "common" (i.e. deduped) blocks will be stored only 
once in the ARC/L2ARC.


--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Dedup... still in beta status

2010-06-15 Thread Fco Javier Garcia
or as a member of the ZFS team
> (which I'm not).
> 

Then you have to be brutally good with Java







> -- 
> Erik Trimble
> Java System Support
> Mailstop:  usca22-123
> Phone:  x17195
> Santa Clara, CA
> 
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discu
> ss
>
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Dedup... still in beta status

2010-06-15 Thread Geoff Nordli
>From: Fco Javier Garcia
>Sent: Tuesday, June 15, 2010 11:21 AM
>
>> Realistically, I think people are overtly-enamored with dedup as a
>> feature - I would generally only consider it worth-while in cases
>> where you get significant savings. And by significant, I'm talking an
>> order of magnitude space savings.  A 2x savings isn't really enough to
>> counteract the down sides.  Especially when even enterprise disk space
>> is
>> (relatively) cheap.
>>
>
>
>I think dedup may have its greatest appeal in VDI environments (think about
a
>environment with 85% if the data that the virtual machine needs is into ARC
or
>L2ARC... is like a dream...almost instantaneous response... and you can
boot a
>new machine in a few seconds)...
>

Does dedup benefit in the ARC/L2ARC space?  

For some reason, I have it in my head that for each time it requests the
block from storage it will copy it into cache; therefore if I had 10 VMs
requesting the same dedup'd block, there will be 10 copies of the same block
in ARC/L2ARC.

Geoff 


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Dedup... still in beta status

2010-06-15 Thread Erik Trimble

On 6/15/2010 10:52 AM, Erik Trimble wrote:


Frankly, dedup isn't practical for anything but enterprise-class 
machines. It's certainly not practical for desktops or anything 
remotely low-end.


This isn't just a ZFS issue - all implementations I've seen so far 
require enterprise-class solutions.


Realistically, I think people are overtly-enamored with dedup as a 
feature - I would generally only consider it worth-while in cases 
where you get significant savings. And by significant, I'm talking an 
order of magnitude space savings.  A 2x savings isn't really enough to 
counteract the down sides.  Especially when even enterprise disk space 
is (relatively) cheap.



That all said, ZFS dedup is still definitely beta. There are known 
severe bugs and performance issues which will take time to fix, as not 
all of them have obvious solutions.  Given current schedules, I 
predict that it should be production-ready some time in 2011. *When* 
in 2011, I couldn't hazard...


Maybe time to make Solaris 10 Update 12 or so? 



One thing here - I forgot to say, this is my opinion based on my 
observations/conversations on this list, and I in no way speak for 
Oracle officially, or as a member of the ZFS team (which I'm not).


--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Dedup... still in beta status

2010-06-15 Thread Fco Javier Garcia
> Realistically, I think people are overtly-enamored
> with dedup as a 
> feature - I would generally only consider it
> worth-while in cases where 
> you get significant savings. And by significant, I'm
> talking an order of 
> magnitude space savings.  A 2x savings isn't really
> enough to counteract 
> the down sides.  Especially when even enterprise disk
> space is 
> (relatively) cheap.
> 


I think dedup may have its greatest appeal in VDI environments (think about a 
environment with 85% if the data that the virtual machine needs is into ARC or 
L2ARC... is like a dream...almost instantaneous response... and you can boot a 
new machine in a few seconds)... 

> 
> That all said, ZFS dedup is still definitely beta.
> There are known 
> severe bugs and performance issues which will take
> time to fix, as not 
> all of them have obvious solutions  Given current
> schedules, I predict 
> that it should be production-ready some time in 2011.
> *When* in 2011, I 
> couldn't hazard...
> 
> Maybe time to make Solaris 10 Update 12 or so?


Yes... so you can start paching Solaris on Monday... and perhaps... it will be 
finished on Tuesday (but next week) 



> 
> -- 
> Erik Trimble
> Java System Support
> Mailstop:  usca22-123
> Phone:  x17195
> Santa Clara, CA
> 
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discu
> ss
>
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Dedup... still in beta status

2010-06-15 Thread Erik Trimble

On 6/15/2010 9:03 AM, Fco Javier Garcia wrote:

Data: 90% of current computers has less than 9 GB of RAM, less than 5% has SSD 
systems.
Let use a computer storage "standard", with a capacity of 4 TB ... dedupe on, 
dataset with blocks of 32 kb ..., 2 TB of data in use ... need 16 GB of memory just only 
for DTT ... but this will not see it until it's too late ... ie, we will work with the 
system ... performance will be good ... Little by little we will see that write 
performance is dropping ... then we will see that the system crashes randomly (when 
deleting automatic snapshots) ... and finally will see that disabling dedup doesnt solve 
it.


It may indicate that dedupe has some requirements ... that is true, but what is 
 true too is that in systems with large amounts of   RAM(for the usual 
parameters) usual operations as file deleting  or datasets/snapshot destroying 
give us  a decrease of performance ... even totally blocking system ... and 
that is not admissible ... so maybe it would be  desirable to place dedupe in a 
freeze (beta or development situation) until we can get one stable version so 
we can  make any necessary changes in the nucleus of zfs that allow its use 
without compromising the integrity of the entire system (p.ejm: Enabling the 
erasing of blocks in multithreading  .)

And what can we do if we have a system already "contaminated" with dedupe? ...
1st Disable snapshots
2. Create a new dataset without dedupe and copy the data to the new dataset.
3. After copying the data, delete the snapshots... first "the smaller", if 
there is some snapshot bigger (more than 10 Gb)... make progresive roollback to it  (Thus 
the snapshot will use 0 bytes) and we can delete.
4. When there are no snapshots in the dataset ... remove slowly (in batches) 
all  files.
5. Finally, when there are no files... destroy de dataset

If we miss any of these steps (and directly try to delete a snapshot with 95 
Gb) , the system will crash ... if we try to delete the dataset and the system 
crashes ... by restarting your computer will crash the system too (since the 
operation will continue trying to erase )

My test system: AMD Athlon X2 5400, 8 Gb RAM, RAIDZ 3 TB, dataset 1,7 Tb, 
snapshot: 87 Gb... tested with: OSOL 134, EON 0.6, Nexenta core 3.02, 
Nexentastor enterprise 3.02... all systems freezes when trying to delete 
snapshots... finally with rollback i could delete all snapshots... but when 
trying to destroy the dataset  ... The system is still processing the order ... 
(after 20 hours ... )
   


Frankly, dedup isn't practical for anything but enterprise-class 
machines. It's certainly not practical for desktops or anything remotely 
low-end.


This isn't just a ZFS issue - all implementations I've seen so far 
require enterprise-class solutions.


Realistically, I think people are overtly-enamored with dedup as a 
feature - I would generally only consider it worth-while in cases where 
you get significant savings. And by significant, I'm talking an order of 
magnitude space savings.  A 2x savings isn't really enough to counteract 
the down sides.  Especially when even enterprise disk space is 
(relatively) cheap.



That all said, ZFS dedup is still definitely beta. There are known 
severe bugs and performance issues which will take time to fix, as not 
all of them have obvious solutions.  Given current schedules, I predict 
that it should be production-ready some time in 2011. *When* in 2011, I 
couldn't hazard...


Maybe time to make Solaris 10 Update 12 or so? 

--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Dedup... still in beta status

2010-06-15 Thread Fco Javier Garcia
Data: 90% of current computers has less than 9 GB of RAM, less than 5% has SSD 
systems.
Let use a computer storage "standard", with a capacity of 4 TB ... dedupe on, 
dataset with blocks of 32 kb ..., 2 TB of data in use ... need 16 GB of memory 
just only for DTT ... but this will not see it until it's too late ... ie, we 
will work with the system ... performance will be good ... Little by little we 
will see that write performance is dropping ... then we will see that the 
system crashes randomly (when deleting automatic snapshots) ... and finally 
will see that disabling dedup doesnt solve it.


It may indicate that dedupe has some requirements ... that is true, but what is 
 true too is that in systems with large amounts of   RAM(for the usual 
parameters) usual operations as file deleting  or datasets/snapshot destroying 
give us  a decrease of performance ... even totally blocking system ... and 
that is not admissible ... so maybe it would be  desirable to place dedupe in a 
freeze (beta or development situation) until we can get one stable version so 
we can  make any necessary changes in the nucleus of zfs that allow its use 
without compromising the integrity of the entire system (p.ejm: Enabling the 
erasing of blocks in multithreading  .)

And what can we do if we have a system already "contaminated" with dedupe? ...
1st Disable snapshots
2. Create a new dataset without dedupe and copy the data to the new dataset.
3. After copying the data, delete the snapshots... first "the smaller", if 
there is some snapshot bigger (more than 10 Gb)... make progresive roollback to 
it  (Thus the snapshot will use 0 bytes) and we can delete.
4. When there are no snapshots in the dataset ... remove slowly (in batches) 
all  files.
5. Finally, when there are no files... destroy de dataset

If we miss any of these steps (and directly try to delete a snapshot with 95 
Gb) , the system will crash ... if we try to delete the dataset and the system 
crashes ... by restarting your computer will crash the system too (since the 
operation will continue trying to erase )

My test system: AMD Athlon X2 5400, 8 Gb RAM, RAIDZ 3 TB, dataset 1,7 Tb, 
snapshot: 87 Gb... tested with: OSOL 134, EON 0.6, Nexenta core 3.02, 
Nexentastor enterprise 3.02... all systems freezes when trying to delete 
snapshots... finally with rollback i could delete all snapshots... but when 
trying to destroy the dataset  ... The system is still processing the order ... 
(after 20 hours ... )
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss