Re: [zfs-discuss] `zfs list` doesn't show my snapshot

2008-11-22 Thread Peter Tribble
On 11/22/08, Jens Elkner [EMAIL PROTECTED] wrote:
 On Fri, Nov 21, 2008 at 03:42:17PM -0800, David Pacheco wrote:
   Pawel Tecza wrote:
But I still don't understand why `zfs list` doesn't display snapshots
by default. I saw it in the Net many times at the examples of zfs usage.
  
   This was PSARC/2008/469 - excluding snapshot info from 'zfs list'
  
   http://opensolaris.org/os/community/on/flag-days/pages/2008091003/

  The uncomplete one - where is the '-t all' option? It's really annoying,
  error prone, time consuming to type stories on the command line ...
  Does anybody remember the keep it small and simple thing?

Hm. I thought the '-t all' worked with the revised zfs list. The problem I
have with that is that you need to type different commands to get the
same output depending on which machine you're on, as '-t all' doesn't
work on older systems.

-- 
-Peter Tribble
http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] RC1 Zfs writes a lot slower when running X

2008-11-22 Thread zerk
Hi,

thanks for the reply

I though that was that too, so I wrote a C program that allocated 1 gig of ram 
doing nothing with it. So the system was left with only 1 gig for ZFS and I saw 
absolutely no performance hit.

I tried the same thing for the CPU by doing a loop and taking 100% on one of 
the 2 core I have. Same thing... no hit at all.

I think it have something to do with system calls, kernel or some like that but 
I don't know enough on that to be able to diagnose such a problem...

The video card I have in there is an old Matrox G200 PCI because this is all I 
got left and because I didn't need anything better than that on a server. I 
disabled the console X server and I only connect via Xvnc server so I would be 
really surprise if Xvnc is doing something with that card but we never know...

Thanks

Zerk
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] `zfs list` doesn't show my snapshot

2008-11-22 Thread Simon Breden
Hi Pawel,

Yes, it did change in the last few months.
On older versions of solaris the default for 'zfs list' was to show all 
filesystems AND snapshots.
This got to be a real pain when you had lots of snapshots as you couldn't 
easily see what was what, so it was changed so that the default for 'zfs list' 
is just to show the filesystems, which is much preferable in my opinion.

As others here have said, just issue 'zfs list -t snapshot' if you just want to 
see the snapshots, or 'zfs list -t all' to see both filesystems and snapshots.

Cheers,
Simon

Blog: http://breden.org.uk
ZFS articles: http://breden.org.uk/2008/03/02/a-home-fileserver-using-zfs/
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] RC1 Zfs writes a lot slower when running X

2008-11-22 Thread Casper . Dik

Hi,

thanks for the reply

I though that was that too, so I wrote a C program that allocated 1 gig of
ram doing nothing with it. So the system was left with only 1 gig for ZFS
and I saw absolutely no performance hit.

Lock it in memory and then try again; if you allocate the memory but you
don't use it, you only have a swap reservation, nothing more.

But if you allocate and then run mlockall(MCL_CURRENT), you take 1GB
of the table.

Casper

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] RC1 Zfs writes a lot slower when running X

2008-11-22 Thread zerk
Great it worker,

mlockall returned -1 probably because the system wasn't able to allocate blocks 
of 512M contiguously... but using memset for each blocks commited the memory 
and I saw the same zfs perf problem as with X  Vbox.

Thanks a lot for the hint :)

Now I guess i'll have to buy more RAM :)

zerk
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Performance bake off vxfs/ufs/zfs need some help

2008-11-22 Thread Chris Greer
So to give a little background on this, we have been benchmarking Oracle RAC on 
Linux vs. Oracle on Solaris.  In the Solaris test, we are using vxvm and vxfs.
We noticed that the same Oracle TPC benchmark at roughly the same transaction 
rate was causing twice as many disk I/O's to the backend DMX4-1500.

So we concluded this is pretty much either Oracle is very different in RAC, or 
our filesystems may be the culprits.  This testing is wrapping up (it all gets 
dismantled Monday), so we took the time to run a simulated disk I/O test with 
an 8K IO size.


vxvm with vxfs we achieved 2387 IOPS
vxvm with ufs we achieved 4447 IOPS
ufs on disk devices we achieved 4540 IOPS
zfs we achieved 1232 IOPS

The only zfs tunings we have done are setting set zfs:zfs_nocache=1
in /etc/system and changing the recordsize to be 8K to match the test.

I think the files we are using in the test were created before we changed the 
recordsize, so I deleted them and recreated them and have started the other 
test...but does anyone have any other ideas?

This is my first experience with ZFS with a comercial RAID array and so far 
it's not that great.

For those interested, we are using the iorate command from EMC for the 
benchmark.  For the different test, we have 13 luns presented.  Each one is its 
own volume and filesystem and a singel file on those filesystems.  We are 
running 13 iorate processes in parallel (there is no cpu bottleneck in this 
either).

For zfs, we put all those luns in a pool with no redundancy and created 13 
filesystems and still running 13 iorate processes.

we are running Solaris 10U6
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance bake off vxfs/ufs/zfs need some help

2008-11-22 Thread Chris Greer
that should be set zfs:zfs_nocacheflush=1
in the post above...that was my typo in the post.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance bake off vxfs/ufs/zfs need some help

2008-11-22 Thread Chris Greer
zfs with the datafiles recreated after the recordsize change was 3079 IOPS
So now we are at least in the ballpark.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Supermicro AOC-USAS-L8i

2008-11-22 Thread Asa Durkee
My Supermicro H8DA3-2's onboard 1068E SAS chip isn't recognized in OpenSolaris, 
and I'd like to keep this particular system all Supermicro, so the L8i it is. 
I know there have been issues with Supermicro-branded 1068E controllers, so 
just wanted to verify that the stock mpt driver supports it.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance bake off vxfs/ufs/zfs need some help

2008-11-22 Thread Dale Ghent

Are you putting your archive and redo logs on a separate zpool (not  
just a different zfs fs with the same pool as your data files) ?

Are you using direct io at all in any of the config scenarios you  
listed?

/dale

On Nov 22, 2008, at 12:41 PM, Chris Greer wrote:

 So to give a little background on this, we have been benchmarking  
 Oracle RAC on Linux vs. Oracle on Solaris.  In the Solaris test, we  
 are using vxvm and vxfs.
 We noticed that the same Oracle TPC benchmark at roughly the same  
 transaction rate was causing twice as many disk I/O's to the backend  
 DMX4-1500.

 So we concluded this is pretty much either Oracle is very different  
 in RAC, or our filesystems may be the culprits.  This testing is  
 wrapping up (it all gets dismantled Monday), so we took the time to  
 run a simulated disk I/O test with an 8K IO size.


 vxvm with vxfs we achieved 2387 IOPS
 vxvm with ufs we achieved 4447 IOPS
 ufs on disk devices we achieved 4540 IOPS
 zfs we achieved 1232 IOPS

 The only zfs tunings we have done are setting set zfs:zfs_nocache=1
 in /etc/system and changing the recordsize to be 8K to match the test.

 I think the files we are using in the test were created before we  
 changed the recordsize, so I deleted them and recreated them and  
 have started the other test...but does anyone have any other ideas?

 This is my first experience with ZFS with a comercial RAID array and  
 so far it's not that great.

 For those interested, we are using the iorate command from EMC for  
 the benchmark.  For the different test, we have 13 luns presented.   
 Each one is its own volume and filesystem and a singel file on those  
 filesystems.  We are running 13 iorate processes in parallel (there  
 is no cpu bottleneck in this either).

 For zfs, we put all those luns in a pool with no redundancy and  
 created 13 filesystems and still running 13 iorate processes.

 we are running Solaris 10U6
 -- 
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance bake off vxfs/ufs/zfs need some help

2008-11-22 Thread Todd Stansell
 For those interested, we are using the iorate command from EMC for  
 the benchmark.  For the different test, we have 13 luns presented.   
 Each one is its own volume and filesystem and a singel file on those  
 filesystems.  We are running 13 iorate processes in parallel (there  
 is no cpu bottleneck in this either).

 For zfs, we put all those luns in a pool with no redundancy and  
 created 13 filesystems and still running 13 iorate processes.

This doesn't seem like an apples-to-apples comparison, unless I'm
misunderstanding.  If you put all of those luns in a single pool for zfs,
you should similarly put all of them in a single volume for vxvm.

Todd
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance bake off vxfs/ufs/zfs need some help

2008-11-22 Thread Chris Greer
Right now we are not using Oracle...we are using iorate so we don't have 
separate logs.  When the testing was with Oracle the logs were separate.  This 
test represents the 13 data luns that we had during those test.

The reason it wasn't striped with vxvm is that the original comparison test was 
vxvm + vxfs compared to Oracle RAC on linux with ocfs.  On the linux side we 
don't have a volume manager, so the database has to do the striping across the 
separate datafiles.  The only way I could mimic that with zfs would be to 
create 13 separate zpools and that sounded pretty painful.

Again, the thing that led us down this path was the the Oracle RAC on Linux 
accompished slightly more transactions but only required 1/2 the I/O's to the 
array to do so.  The Sun test, actually bottlenecked on the backend disk and 
had plenty of CPU left on the host.  So if the I/O bottleneck is actually the 
vxfs filesystem causing more I/O to the backend, and we can fix that with a 
different filesystem, then the Sun box may beat the Linux RAC.   But our 
initial testing has shown that vxfs is all it's cracked up to be with respect 
to databases (yes we tried the database edition too and the performance 
actually got slightly worse).
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] RC1 Zfs writes a lot slower when running X

2008-11-22 Thread Bob Friesenhahn
On Fri, 21 Nov 2008, zerk wrote:

 I have OpenSolaris on an Amd64 Asus-A8NE with 2gig of Rams and 4x320 gig sata 
 drives in raidz1.

 With dd, I can write at quasi disk maximum speed of 80meg each for a total of 
 250meg/s if I have no Xsession at all (only console tty).

 But as soon as I have an Xsession running, the write speed drops to 
 about 120MB/s. Its even worse if I have a VBoxHeadless running with 
 an idle win2k3 inside. It drops to 30 MB/s.

I believe that the OpenSolaris kernel is now extended such that it 
reports file change events to Gnome for files in the user's home 
directory.  When Gnome hears about a change, then it goes and reads 
the file so that searching is fast and there is a nice per-generated 
thumbnail.  This means that there is more than simple memory 
consumption going on.

Try writing into the same pool but outside of your home directory and 
see if the I/O rate improves.  If it does, then go complain on the 
Desktop list.  I already complained in advance on the Desktop list but 
where was little response (as usual) so I have since unsubscribed.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance bake off vxfs/ufs/zfs need some help

2008-11-22 Thread Bob Friesenhahn
On Sat, 22 Nov 2008, Chris Greer wrote:

 zfs with the datafiles recreated after the recordsize change was 3079 IOPS
 So now we are at least in the ballpark.

ZFS is optimized for fast bulk data storage and data integrity and not 
so much for transactions.  It seems that adding a non-volatile 
hardware cache device can help quite a lot, but you may need to use 
OpenSolaris to fully take advantage of it.

It is important to consider how fast things will be a month or two 
from now so it may be necessary to run the benchmark for quite some 
time in order to see how performance degrades.

The 3079 IOPS is probably the limit of what your current hardware can 
do with ZFS.  I see a bit over 3100 here for random synchronous 
writers using 12 disks (arranged as six mirror pairs) and 8 writers.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS fragmentation with MySQL databases

2008-11-22 Thread Tamer Embaby
Kees Nuyt wrote:
 My explanation would be: Whenever a block within a file
 changes, zfs has to write it at another location (copy on
 write), so the previous version isn't immediately lost.

 Zfs will try to keep the new version of the block close to
 the original one, but after several changes on the same
 database page, things get pretty messed up and logical
 sequential I/O becomes pretty much physically random indeed.

 The original blocks will eventually be added to the freelist
 and reused, so proximity can be restored, but it will never
 be 100% sequential again.
 The effect is larger when many snapshots are kept, because
 older block versions are not freed, or when the same block
 is changed very often and freelist updating has to be
 postponed.

 That is the trade-off between always consistent and
 fast.
   
Well, does that mean ZFS is not best suited for database engines as 
underlying
filesystem?  With databases it will always be fragmented, hence slow
performance?

Because this way it would be best to use it for large file server that
don't usually change frequently.

Thanks,
Tamer
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS fragmentation with MySQL databases

2008-11-22 Thread Luke Lonergan
ZFS works marvelously well for data warehouse and analytic DBs.  For lots of 
small updates scattered across the breadth of the persistent working set, it's 
not going to work well IMO.

Note that we're using ZFS to host databases as large as 10,000 TB - that's 10PB 
(!!).  Solaris 10 U5 on X4540.  That said - it's on 96 servers running 
Greenplum DB.

With SSD, the randomness won't matter much I expect, though the filesystem 
won't be helping by virtue of this fragmentation effect of COW.

- Luke

- Original Message -
From: [EMAIL PROTECTED] [EMAIL PROTECTED]
To: zfs-discuss@opensolaris.org zfs-discuss@opensolaris.org
Sent: Sat Nov 22 16:43:53 2008
Subject: Re: [zfs-discuss] ZFS fragmentation with MySQL databases

Kees Nuyt wrote:
 My explanation would be: Whenever a block within a file
 changes, zfs has to write it at another location (copy on
 write), so the previous version isn't immediately lost.

 Zfs will try to keep the new version of the block close to
 the original one, but after several changes on the same
 database page, things get pretty messed up and logical
 sequential I/O becomes pretty much physically random indeed.

 The original blocks will eventually be added to the freelist
 and reused, so proximity can be restored, but it will never
 be 100% sequential again.
 The effect is larger when many snapshots are kept, because
 older block versions are not freed, or when the same block
 is changed very often and freelist updating has to be
 postponed.

 That is the trade-off between always consistent and
 fast.

Well, does that mean ZFS is not best suited for database engines as
underlying
filesystem?  With databases it will always be fragmented, hence slow
performance?

Because this way it would be best to use it for large file server that
don't usually change frequently.

Thanks,
Tamer
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS fragmentation with MySQL databases

2008-11-22 Thread Bob Friesenhahn
On Sun, 23 Nov 2008, Tamer Embaby wrote:
 That is the trade-off between always consistent and
 fast.

 Well, does that mean ZFS is not best suited for database engines as 
 underlying filesystem?  With databases it will always be fragmented, 
 hence slow performance?

Assuming that the filesystem block size matches the database size 
there is not so much of an issue with fragmentation because databases 
are generally fragmented (almost by definition) due to their nature of 
random access.  Only a freshly written database from carefully ordered 
insert statements might be in a linear order, and only for accesses in 
the same linear order.  Database indexes could be negatively impacted, 
but they are likely to be cached in RAM anyway.  I understand that zfs 
uses a slab allocator so that file data is reserved in larger slabs 
(e.g. 1MB) and then the blocks are carved out of that.  This tends to 
keep more of the file data together and reduces allocation overhead.

Fragmentation is more of an impact for large files which should 
usually be accessed sequentially.

Zfs's COW algorithm and ordered writes will always be slower than for 
filesystems which simply overwrite existing blocks, but there is a 
better chance that the database will be immediately usable if someone 
pulls the power plug, and without needing to rely on special 
battery-backed hardware.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS fragmentation with MySQL databases

2008-11-22 Thread Richard Elling
Luke Lonergan wrote:
 ZFS works marvelously well for data warehouse and analytic DBs.  For lots of 
 small updates scattered across the breadth of the persistent working set, 
 it's not going to work well IMO.
   

Actually, it does seem to work quite well when you use a read optimized
SSD for the L2ARC.  In that case, random read workloads have very
fast access, once the cache is warm.
 -- richard

 Note that we're using ZFS to host databases as large as 10,000 TB - that's 
 10PB (!!).  Solaris 10 U5 on X4540.  That said - it's on 96 servers running 
 Greenplum DB.

 With SSD, the randomness won't matter much I expect, though the filesystem 
 won't be helping by virtue of this fragmentation effect of COW.

 - Luke

 - Original Message -
 From: [EMAIL PROTECTED] [EMAIL PROTECTED]
 To: zfs-discuss@opensolaris.org zfs-discuss@opensolaris.org
 Sent: Sat Nov 22 16:43:53 2008
 Subject: Re: [zfs-discuss] ZFS fragmentation with MySQL databases

 Kees Nuyt wrote:
   
 My explanation would be: Whenever a block within a file
 changes, zfs has to write it at another location (copy on
 write), so the previous version isn't immediately lost.

 Zfs will try to keep the new version of the block close to
 the original one, but after several changes on the same
 database page, things get pretty messed up and logical
 sequential I/O becomes pretty much physically random indeed.

 The original blocks will eventually be added to the freelist
 and reused, so proximity can be restored, but it will never
 be 100% sequential again.
 The effect is larger when many snapshots are kept, because
 older block versions are not freed, or when the same block
 is changed very often and freelist updating has to be
 postponed.

 That is the trade-off between always consistent and
 fast.

 
 Well, does that mean ZFS is not best suited for database engines as
 underlying
 filesystem?  With databases it will always be fragmented, hence slow
 performance?

 Because this way it would be best to use it for large file server that
 don't usually change frequently.

 Thanks,
 Tamer
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance bake off vxfs/ufs/zfs need some help

2008-11-22 Thread Richard Elling
Chris Greer wrote:
 Right now we are not using Oracle...we are using iorate so we don't have 
 separate logs.  When the testing was with Oracle the logs were separate.  
 This test represents the 13 data luns that we had during those test.

 The reason it wasn't striped with vxvm is that the original comparison test 
 was vxvm + vxfs compared to Oracle RAC on linux with ocfs. 

You can't use ZFS directly for Oracle RAC, so perhaps you should test
those things which might work for your application?
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS fragmentation with MySQL databases

2008-11-22 Thread Luke Lonergan

 Actually, it does seem to work quite
 well when you use a read optimized
 SSD for the L2ARC.  In that case,
 random read workloads have very
 fast access, once the cache is warm.

One would expect so, yes.  But the usefulness of this is limited to the cases 
where the entire working set will fit into an SSD cache.

In other words, for random access across a working set larger (by say X%) than 
the SSD-backed L2 ARC, the cache is useless.  This should asymptotically 
approach truth as X grows and experience shows that X=200% is where it's about 
99% true.

As time passes and SSDs get larger while many OLTP random workloads remain 
somewhat constrained in size, this becomes less important.

Modern DB workloads are becoming hybridized, though.  A 'mixed workload' 
scenario is now common where there are a mix of updated working sets and 
indexed access alongside heavy analytical 'update rarely if ever' kind of 
workloads.

- Luke

- Original Message -
From: [EMAIL PROTECTED] [EMAIL PROTECTED]
To: Luke Lonergan
Cc: [EMAIL PROTECTED] [EMAIL PROTECTED]; zfs-discuss@opensolaris.org 
zfs-discuss@opensolaris.org
Sent: Sat Nov 22 20:28:54 2008
Subject: Re: [zfs-discuss] ZFS fragmentation with MySQL databases

Luke Lonergan wrote:
 ZFS works marvelously well for data warehouse and analytic DBs.  For lots of 
 small updates scattered across the breadth of the persistent working set, 
 it's not going to work well IMO.


Actually, it does seem to work quite well when you use a read optimized
SSD for the L2ARC.  In that case, random read workloads have very
fast access, once the cache is warm.
 -- richard

 Note that we're using ZFS to host databases as large as 10,000 TB - that's 
 10PB (!!).  Solaris 10 U5 on X4540.  That said - it's on 96 servers running 
 Greenplum DB.

 With SSD, the randomness won't matter much I expect, though the filesystem 
 won't be helping by virtue of this fragmentation effect of COW.

 - Luke

 - Original Message -
 From: [EMAIL PROTECTED] [EMAIL PROTECTED]
 To: zfs-discuss@opensolaris.org zfs-discuss@opensolaris.org
 Sent: Sat Nov 22 16:43:53 2008
 Subject: Re: [zfs-discuss] ZFS fragmentation with MySQL databases

 Kees Nuyt wrote:

 My explanation would be: Whenever a block within a file
 changes, zfs has to write it at another location (copy on
 write), so the previous version isn't immediately lost.

 Zfs will try to keep the new version of the block close to
 the original one, but after several changes on the same
 database page, things get pretty messed up and logical
 sequential I/O becomes pretty much physically random indeed.

 The original blocks will eventually be added to the freelist
 and reused, so proximity can be restored, but it will never
 be 100% sequential again.
 The effect is larger when many snapshots are kept, because
 older block versions are not freed, or when the same block
 is changed very often and freelist updating has to be
 postponed.

 That is the trade-off between always consistent and
 fast.


 Well, does that mean ZFS is not best suited for database engines as
 underlying
 filesystem?  With databases it will always be fragmented, hence slow
 performance?

 Because this way it would be best to use it for large file server that
 don't usually change frequently.

 Thanks,
 Tamer
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS fragmentation with MySQL databases

2008-11-22 Thread Bob Netherton

 In other words, for random access across a working set larger (by say X%) 
 than the SSD-backed L2 ARC, the cache is useless.  This should asymptotically 
 approach truth as X grows and experience shows that X=200% is where it's 
 about 99% true.
   
Ummm, before we throw around phrases like useless, how about a little 
testing ?I like a
good academic argument just like the next guy, but before I dismiss 
something completely
out of hand I'd like to see some data. 

Bob
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss