Re: [zfs-discuss] `zfs list` doesn't show my snapshot
On 11/22/08, Jens Elkner [EMAIL PROTECTED] wrote: On Fri, Nov 21, 2008 at 03:42:17PM -0800, David Pacheco wrote: Pawel Tecza wrote: But I still don't understand why `zfs list` doesn't display snapshots by default. I saw it in the Net many times at the examples of zfs usage. This was PSARC/2008/469 - excluding snapshot info from 'zfs list' http://opensolaris.org/os/community/on/flag-days/pages/2008091003/ The uncomplete one - where is the '-t all' option? It's really annoying, error prone, time consuming to type stories on the command line ... Does anybody remember the keep it small and simple thing? Hm. I thought the '-t all' worked with the revised zfs list. The problem I have with that is that you need to type different commands to get the same output depending on which machine you're on, as '-t all' doesn't work on older systems. -- -Peter Tribble http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] RC1 Zfs writes a lot slower when running X
Hi, thanks for the reply I though that was that too, so I wrote a C program that allocated 1 gig of ram doing nothing with it. So the system was left with only 1 gig for ZFS and I saw absolutely no performance hit. I tried the same thing for the CPU by doing a loop and taking 100% on one of the 2 core I have. Same thing... no hit at all. I think it have something to do with system calls, kernel or some like that but I don't know enough on that to be able to diagnose such a problem... The video card I have in there is an old Matrox G200 PCI because this is all I got left and because I didn't need anything better than that on a server. I disabled the console X server and I only connect via Xvnc server so I would be really surprise if Xvnc is doing something with that card but we never know... Thanks Zerk -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] `zfs list` doesn't show my snapshot
Hi Pawel, Yes, it did change in the last few months. On older versions of solaris the default for 'zfs list' was to show all filesystems AND snapshots. This got to be a real pain when you had lots of snapshots as you couldn't easily see what was what, so it was changed so that the default for 'zfs list' is just to show the filesystems, which is much preferable in my opinion. As others here have said, just issue 'zfs list -t snapshot' if you just want to see the snapshots, or 'zfs list -t all' to see both filesystems and snapshots. Cheers, Simon Blog: http://breden.org.uk ZFS articles: http://breden.org.uk/2008/03/02/a-home-fileserver-using-zfs/ -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] RC1 Zfs writes a lot slower when running X
Hi, thanks for the reply I though that was that too, so I wrote a C program that allocated 1 gig of ram doing nothing with it. So the system was left with only 1 gig for ZFS and I saw absolutely no performance hit. Lock it in memory and then try again; if you allocate the memory but you don't use it, you only have a swap reservation, nothing more. But if you allocate and then run mlockall(MCL_CURRENT), you take 1GB of the table. Casper ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] RC1 Zfs writes a lot slower when running X
Great it worker, mlockall returned -1 probably because the system wasn't able to allocate blocks of 512M contiguously... but using memset for each blocks commited the memory and I saw the same zfs perf problem as with X Vbox. Thanks a lot for the hint :) Now I guess i'll have to buy more RAM :) zerk -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Performance bake off vxfs/ufs/zfs need some help
So to give a little background on this, we have been benchmarking Oracle RAC on Linux vs. Oracle on Solaris. In the Solaris test, we are using vxvm and vxfs. We noticed that the same Oracle TPC benchmark at roughly the same transaction rate was causing twice as many disk I/O's to the backend DMX4-1500. So we concluded this is pretty much either Oracle is very different in RAC, or our filesystems may be the culprits. This testing is wrapping up (it all gets dismantled Monday), so we took the time to run a simulated disk I/O test with an 8K IO size. vxvm with vxfs we achieved 2387 IOPS vxvm with ufs we achieved 4447 IOPS ufs on disk devices we achieved 4540 IOPS zfs we achieved 1232 IOPS The only zfs tunings we have done are setting set zfs:zfs_nocache=1 in /etc/system and changing the recordsize to be 8K to match the test. I think the files we are using in the test were created before we changed the recordsize, so I deleted them and recreated them and have started the other test...but does anyone have any other ideas? This is my first experience with ZFS with a comercial RAID array and so far it's not that great. For those interested, we are using the iorate command from EMC for the benchmark. For the different test, we have 13 luns presented. Each one is its own volume and filesystem and a singel file on those filesystems. We are running 13 iorate processes in parallel (there is no cpu bottleneck in this either). For zfs, we put all those luns in a pool with no redundancy and created 13 filesystems and still running 13 iorate processes. we are running Solaris 10U6 -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Performance bake off vxfs/ufs/zfs need some help
that should be set zfs:zfs_nocacheflush=1 in the post above...that was my typo in the post. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Performance bake off vxfs/ufs/zfs need some help
zfs with the datafiles recreated after the recordsize change was 3079 IOPS So now we are at least in the ballpark. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Supermicro AOC-USAS-L8i
My Supermicro H8DA3-2's onboard 1068E SAS chip isn't recognized in OpenSolaris, and I'd like to keep this particular system all Supermicro, so the L8i it is. I know there have been issues with Supermicro-branded 1068E controllers, so just wanted to verify that the stock mpt driver supports it. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Performance bake off vxfs/ufs/zfs need some help
Are you putting your archive and redo logs on a separate zpool (not just a different zfs fs with the same pool as your data files) ? Are you using direct io at all in any of the config scenarios you listed? /dale On Nov 22, 2008, at 12:41 PM, Chris Greer wrote: So to give a little background on this, we have been benchmarking Oracle RAC on Linux vs. Oracle on Solaris. In the Solaris test, we are using vxvm and vxfs. We noticed that the same Oracle TPC benchmark at roughly the same transaction rate was causing twice as many disk I/O's to the backend DMX4-1500. So we concluded this is pretty much either Oracle is very different in RAC, or our filesystems may be the culprits. This testing is wrapping up (it all gets dismantled Monday), so we took the time to run a simulated disk I/O test with an 8K IO size. vxvm with vxfs we achieved 2387 IOPS vxvm with ufs we achieved 4447 IOPS ufs on disk devices we achieved 4540 IOPS zfs we achieved 1232 IOPS The only zfs tunings we have done are setting set zfs:zfs_nocache=1 in /etc/system and changing the recordsize to be 8K to match the test. I think the files we are using in the test were created before we changed the recordsize, so I deleted them and recreated them and have started the other test...but does anyone have any other ideas? This is my first experience with ZFS with a comercial RAID array and so far it's not that great. For those interested, we are using the iorate command from EMC for the benchmark. For the different test, we have 13 luns presented. Each one is its own volume and filesystem and a singel file on those filesystems. We are running 13 iorate processes in parallel (there is no cpu bottleneck in this either). For zfs, we put all those luns in a pool with no redundancy and created 13 filesystems and still running 13 iorate processes. we are running Solaris 10U6 -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Performance bake off vxfs/ufs/zfs need some help
For those interested, we are using the iorate command from EMC for the benchmark. For the different test, we have 13 luns presented. Each one is its own volume and filesystem and a singel file on those filesystems. We are running 13 iorate processes in parallel (there is no cpu bottleneck in this either). For zfs, we put all those luns in a pool with no redundancy and created 13 filesystems and still running 13 iorate processes. This doesn't seem like an apples-to-apples comparison, unless I'm misunderstanding. If you put all of those luns in a single pool for zfs, you should similarly put all of them in a single volume for vxvm. Todd ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Performance bake off vxfs/ufs/zfs need some help
Right now we are not using Oracle...we are using iorate so we don't have separate logs. When the testing was with Oracle the logs were separate. This test represents the 13 data luns that we had during those test. The reason it wasn't striped with vxvm is that the original comparison test was vxvm + vxfs compared to Oracle RAC on linux with ocfs. On the linux side we don't have a volume manager, so the database has to do the striping across the separate datafiles. The only way I could mimic that with zfs would be to create 13 separate zpools and that sounded pretty painful. Again, the thing that led us down this path was the the Oracle RAC on Linux accompished slightly more transactions but only required 1/2 the I/O's to the array to do so. The Sun test, actually bottlenecked on the backend disk and had plenty of CPU left on the host. So if the I/O bottleneck is actually the vxfs filesystem causing more I/O to the backend, and we can fix that with a different filesystem, then the Sun box may beat the Linux RAC. But our initial testing has shown that vxfs is all it's cracked up to be with respect to databases (yes we tried the database edition too and the performance actually got slightly worse). -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] RC1 Zfs writes a lot slower when running X
On Fri, 21 Nov 2008, zerk wrote: I have OpenSolaris on an Amd64 Asus-A8NE with 2gig of Rams and 4x320 gig sata drives in raidz1. With dd, I can write at quasi disk maximum speed of 80meg each for a total of 250meg/s if I have no Xsession at all (only console tty). But as soon as I have an Xsession running, the write speed drops to about 120MB/s. Its even worse if I have a VBoxHeadless running with an idle win2k3 inside. It drops to 30 MB/s. I believe that the OpenSolaris kernel is now extended such that it reports file change events to Gnome for files in the user's home directory. When Gnome hears about a change, then it goes and reads the file so that searching is fast and there is a nice per-generated thumbnail. This means that there is more than simple memory consumption going on. Try writing into the same pool but outside of your home directory and see if the I/O rate improves. If it does, then go complain on the Desktop list. I already complained in advance on the Desktop list but where was little response (as usual) so I have since unsubscribed. Bob == Bob Friesenhahn [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Performance bake off vxfs/ufs/zfs need some help
On Sat, 22 Nov 2008, Chris Greer wrote: zfs with the datafiles recreated after the recordsize change was 3079 IOPS So now we are at least in the ballpark. ZFS is optimized for fast bulk data storage and data integrity and not so much for transactions. It seems that adding a non-volatile hardware cache device can help quite a lot, but you may need to use OpenSolaris to fully take advantage of it. It is important to consider how fast things will be a month or two from now so it may be necessary to run the benchmark for quite some time in order to see how performance degrades. The 3079 IOPS is probably the limit of what your current hardware can do with ZFS. I see a bit over 3100 here for random synchronous writers using 12 disks (arranged as six mirror pairs) and 8 writers. Bob == Bob Friesenhahn [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS fragmentation with MySQL databases
Kees Nuyt wrote: My explanation would be: Whenever a block within a file changes, zfs has to write it at another location (copy on write), so the previous version isn't immediately lost. Zfs will try to keep the new version of the block close to the original one, but after several changes on the same database page, things get pretty messed up and logical sequential I/O becomes pretty much physically random indeed. The original blocks will eventually be added to the freelist and reused, so proximity can be restored, but it will never be 100% sequential again. The effect is larger when many snapshots are kept, because older block versions are not freed, or when the same block is changed very often and freelist updating has to be postponed. That is the trade-off between always consistent and fast. Well, does that mean ZFS is not best suited for database engines as underlying filesystem? With databases it will always be fragmented, hence slow performance? Because this way it would be best to use it for large file server that don't usually change frequently. Thanks, Tamer ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS fragmentation with MySQL databases
ZFS works marvelously well for data warehouse and analytic DBs. For lots of small updates scattered across the breadth of the persistent working set, it's not going to work well IMO. Note that we're using ZFS to host databases as large as 10,000 TB - that's 10PB (!!). Solaris 10 U5 on X4540. That said - it's on 96 servers running Greenplum DB. With SSD, the randomness won't matter much I expect, though the filesystem won't be helping by virtue of this fragmentation effect of COW. - Luke - Original Message - From: [EMAIL PROTECTED] [EMAIL PROTECTED] To: zfs-discuss@opensolaris.org zfs-discuss@opensolaris.org Sent: Sat Nov 22 16:43:53 2008 Subject: Re: [zfs-discuss] ZFS fragmentation with MySQL databases Kees Nuyt wrote: My explanation would be: Whenever a block within a file changes, zfs has to write it at another location (copy on write), so the previous version isn't immediately lost. Zfs will try to keep the new version of the block close to the original one, but after several changes on the same database page, things get pretty messed up and logical sequential I/O becomes pretty much physically random indeed. The original blocks will eventually be added to the freelist and reused, so proximity can be restored, but it will never be 100% sequential again. The effect is larger when many snapshots are kept, because older block versions are not freed, or when the same block is changed very often and freelist updating has to be postponed. That is the trade-off between always consistent and fast. Well, does that mean ZFS is not best suited for database engines as underlying filesystem? With databases it will always be fragmented, hence slow performance? Because this way it would be best to use it for large file server that don't usually change frequently. Thanks, Tamer ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS fragmentation with MySQL databases
On Sun, 23 Nov 2008, Tamer Embaby wrote: That is the trade-off between always consistent and fast. Well, does that mean ZFS is not best suited for database engines as underlying filesystem? With databases it will always be fragmented, hence slow performance? Assuming that the filesystem block size matches the database size there is not so much of an issue with fragmentation because databases are generally fragmented (almost by definition) due to their nature of random access. Only a freshly written database from carefully ordered insert statements might be in a linear order, and only for accesses in the same linear order. Database indexes could be negatively impacted, but they are likely to be cached in RAM anyway. I understand that zfs uses a slab allocator so that file data is reserved in larger slabs (e.g. 1MB) and then the blocks are carved out of that. This tends to keep more of the file data together and reduces allocation overhead. Fragmentation is more of an impact for large files which should usually be accessed sequentially. Zfs's COW algorithm and ordered writes will always be slower than for filesystems which simply overwrite existing blocks, but there is a better chance that the database will be immediately usable if someone pulls the power plug, and without needing to rely on special battery-backed hardware. Bob == Bob Friesenhahn [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS fragmentation with MySQL databases
Luke Lonergan wrote: ZFS works marvelously well for data warehouse and analytic DBs. For lots of small updates scattered across the breadth of the persistent working set, it's not going to work well IMO. Actually, it does seem to work quite well when you use a read optimized SSD for the L2ARC. In that case, random read workloads have very fast access, once the cache is warm. -- richard Note that we're using ZFS to host databases as large as 10,000 TB - that's 10PB (!!). Solaris 10 U5 on X4540. That said - it's on 96 servers running Greenplum DB. With SSD, the randomness won't matter much I expect, though the filesystem won't be helping by virtue of this fragmentation effect of COW. - Luke - Original Message - From: [EMAIL PROTECTED] [EMAIL PROTECTED] To: zfs-discuss@opensolaris.org zfs-discuss@opensolaris.org Sent: Sat Nov 22 16:43:53 2008 Subject: Re: [zfs-discuss] ZFS fragmentation with MySQL databases Kees Nuyt wrote: My explanation would be: Whenever a block within a file changes, zfs has to write it at another location (copy on write), so the previous version isn't immediately lost. Zfs will try to keep the new version of the block close to the original one, but after several changes on the same database page, things get pretty messed up and logical sequential I/O becomes pretty much physically random indeed. The original blocks will eventually be added to the freelist and reused, so proximity can be restored, but it will never be 100% sequential again. The effect is larger when many snapshots are kept, because older block versions are not freed, or when the same block is changed very often and freelist updating has to be postponed. That is the trade-off between always consistent and fast. Well, does that mean ZFS is not best suited for database engines as underlying filesystem? With databases it will always be fragmented, hence slow performance? Because this way it would be best to use it for large file server that don't usually change frequently. Thanks, Tamer ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Performance bake off vxfs/ufs/zfs need some help
Chris Greer wrote: Right now we are not using Oracle...we are using iorate so we don't have separate logs. When the testing was with Oracle the logs were separate. This test represents the 13 data luns that we had during those test. The reason it wasn't striped with vxvm is that the original comparison test was vxvm + vxfs compared to Oracle RAC on linux with ocfs. You can't use ZFS directly for Oracle RAC, so perhaps you should test those things which might work for your application? -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS fragmentation with MySQL databases
Actually, it does seem to work quite well when you use a read optimized SSD for the L2ARC. In that case, random read workloads have very fast access, once the cache is warm. One would expect so, yes. But the usefulness of this is limited to the cases where the entire working set will fit into an SSD cache. In other words, for random access across a working set larger (by say X%) than the SSD-backed L2 ARC, the cache is useless. This should asymptotically approach truth as X grows and experience shows that X=200% is where it's about 99% true. As time passes and SSDs get larger while many OLTP random workloads remain somewhat constrained in size, this becomes less important. Modern DB workloads are becoming hybridized, though. A 'mixed workload' scenario is now common where there are a mix of updated working sets and indexed access alongside heavy analytical 'update rarely if ever' kind of workloads. - Luke - Original Message - From: [EMAIL PROTECTED] [EMAIL PROTECTED] To: Luke Lonergan Cc: [EMAIL PROTECTED] [EMAIL PROTECTED]; zfs-discuss@opensolaris.org zfs-discuss@opensolaris.org Sent: Sat Nov 22 20:28:54 2008 Subject: Re: [zfs-discuss] ZFS fragmentation with MySQL databases Luke Lonergan wrote: ZFS works marvelously well for data warehouse and analytic DBs. For lots of small updates scattered across the breadth of the persistent working set, it's not going to work well IMO. Actually, it does seem to work quite well when you use a read optimized SSD for the L2ARC. In that case, random read workloads have very fast access, once the cache is warm. -- richard Note that we're using ZFS to host databases as large as 10,000 TB - that's 10PB (!!). Solaris 10 U5 on X4540. That said - it's on 96 servers running Greenplum DB. With SSD, the randomness won't matter much I expect, though the filesystem won't be helping by virtue of this fragmentation effect of COW. - Luke - Original Message - From: [EMAIL PROTECTED] [EMAIL PROTECTED] To: zfs-discuss@opensolaris.org zfs-discuss@opensolaris.org Sent: Sat Nov 22 16:43:53 2008 Subject: Re: [zfs-discuss] ZFS fragmentation with MySQL databases Kees Nuyt wrote: My explanation would be: Whenever a block within a file changes, zfs has to write it at another location (copy on write), so the previous version isn't immediately lost. Zfs will try to keep the new version of the block close to the original one, but after several changes on the same database page, things get pretty messed up and logical sequential I/O becomes pretty much physically random indeed. The original blocks will eventually be added to the freelist and reused, so proximity can be restored, but it will never be 100% sequential again. The effect is larger when many snapshots are kept, because older block versions are not freed, or when the same block is changed very often and freelist updating has to be postponed. That is the trade-off between always consistent and fast. Well, does that mean ZFS is not best suited for database engines as underlying filesystem? With databases it will always be fragmented, hence slow performance? Because this way it would be best to use it for large file server that don't usually change frequently. Thanks, Tamer ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS fragmentation with MySQL databases
In other words, for random access across a working set larger (by say X%) than the SSD-backed L2 ARC, the cache is useless. This should asymptotically approach truth as X grows and experience shows that X=200% is where it's about 99% true. Ummm, before we throw around phrases like useless, how about a little testing ?I like a good academic argument just like the next guy, but before I dismiss something completely out of hand I'd like to see some data. Bob ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss