Re: [zfs-discuss] Improving zfs send performance
Thomas, for long latency fat links, it should be quite beneficial to set the socket buffer on the receive side (instead of having users tune tcp_recv_hiwat). throughput of a tcp connnection is gated by receive socket buffer / round trip time. Could that be Ross' problem ? -r Ross Smith writes: Thanks, that got it working. I'm still only getting 10MB/s, so it's not solved my problem - I've still got a bottleneck somewhere, but mbuffer is a huge improvement over standard zfs send / receive. It makes such a difference when you can actually see what's going on. Date: Wed, 15 Oct 2008 12:08:14 +0200 From: [EMAIL PROTECTED] To: [EMAIL PROTECTED]; zfs-discuss@opensolaris.org Subject: Re: [zfs-discuss] Improving zfs send performance Ross schrieb: Hi, I'm just doing my first proper send/receive over the network and I'm getting just 9.4MB/s over a gigabit link. Would you be able to provide an example of how to use mbuffer / socat with ZFS for a Solaris beginner? thanks, Ross -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss receiver mbuffer -I sender:1 -s 128k -m 512M | zfs receive sender zfs send mypool/[EMAIL PROTECTED] | mbuffer -s 128k -m 512M -O receiver:1 BTW: I release a new version of mbuffer today. HTH, Thomas _ Make a mini you and download it into Windows Live Messenger http://clk.atdmt.com/UKM/go/111354029/direct/01/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Improving zfs send performance
Roch schrieb: Thomas, for long latency fat links, it should be quite beneficial to set the socket buffer on the receive side (instead of having users tune tcp_recv_hiwat). throughput of a tcp connnection is gated by receive socket buffer / round trip time. Could that be Ross' problem ? -r Hmm, I'm not a TCP expert, but that sounds absolutely possible, if Solaris 10 isn't tuning the TCP buffer automatically. The default receive buffer seems to be 48k (at least on a V240 running 118833-33). So if the block size is something like 128k it would absolutely make sense to tune the receive buffer to lower the rund trip time... Ross: Would you like a patch to test if this is the case? Which version of mbuffer are you currently using? - Thomas ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Improving zfs send performance
Richard Elling пишет: Keep in mind that this is for Solaris 10 not opensolaris. Keep in mind that any changes required for Solaris 10 will first be available in OpenSolaris, including any changes which may have already been implemented. Indeed. For example, less than a week ago fix for the following two CRs (along with some others) was put back into Solaris Nevada: 6333409 traversal code should be able to issue multiple reads in parallel 6418042 want traversal in depth-first pre-order for quicker 'zfs send' This should have positive impact on 'zfs send' performance. Wbr, victor ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Improving zfs send performance
On Mon, Oct 20, 2008 at 1:52 AM, Victor Latushkin [EMAIL PROTECTED]wrote Indeed. For example, less than a week ago fix for the following two CRs (along with some others) was put back into Solaris Nevada: 6333409 traversal code should be able to issue multiple reads in parallel 6418042 want traversal in depth-first pre-order for quicker 'zfs send' That is helpful Victor. Does anyone have a full list of CRs that I can provide to sun support? I have tried searching the bugs database, but I didn't even find those two on my own. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Improving zfs send performance
Hi Miles Nordin wrote: r == Ross [EMAIL PROTECTED] writes: r figures so close to 10MB/s. All three servers are running r full duplex gigabit though there is one tricky way 100Mbit/s could still bite you, but it's probably not happening to you. It mostly affects home users with unmanaged switches: http://www.smallnetbuilder.com/content/view/30212/54/ http://virtualthreads.blogspot.com/2006/02/beware-ethernet-flow-control.html because the big switch vendors all use pause frames safely: http://www.networkworld.com/netresources/0913flow2.html -- pause frames as interpreted by netgear are harmful That rings a bell, Ross, are you using NFS via UDP or TCP? May it be that your network has different performance levels for different transport types? For our network we have disabled pause frames completey and rely only on TCP internal mechanisms to prevent flooding/blocking. Carsten PS: the job where 25k files sizing up to 800 GB is now done - zfs send took only 52 hrs and the speed was ~ 4.5 MB/s :( ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Improving zfs send performance
Ok, just did some more testing on this machine to try to find where my bottlenecks are. Something very odd is going on here. As best I can tell there are two separate problems now: - something is throttling network output to 10MB/s - something is throttling zfs send to around 20MB/s The network throughput I've verified with mbuffer: 1. A quick mbuffer test from /dev/zero to /dev/null gave me 565MB/s. 2. On a test server, mbuffer sending from /dev/zero on one machine to /dev/null on another gave me 37MB/s 3. On the live server, mbuffer sending from /dev/zero to the same receiving machine gave me just under 10MB/s. This looks very much like mbuffer is throttled on this machine, but I know NFS can give me 60-80MB/s. Can anybody give me a clue as to what could be causing this? And the disk performance is just as confusing. Again I used a test server to provide a comparison, and this time used a zfs scrub with iostat to check the performance possible on the disks. Live server: 5 sets of 3 way mirrors Test server: 5 disk raid-z2 1. On the Live server, zfs send to /dev/null via mbuffer reports a speed of 21MB/s # zfs send [EMAIL PROTECTED] | mbuffer -s 128k -m 512M /dev/null 2. On the Test server, zfs send to /dev/null via mbuffer reports a speed of 35MB/s 3. On the Live server, zpool scrub and iostat report a peak of 3k iops, and 283MB/s throughput. 4. On the Test server, zpool scrub and iostat report a peak of 472 iops, and 53MB/s throughput. Surely the send and scrub operations should give similar results? Why is zpool scrub running 10-15x faster than zfs send on the live server? The iostat figures on the live server are particularly telling. During a scrub (30s intervals): capacity operationsbandwidth pool used avail read write read write -- - - - - - - rc-pool 734G 1.55T 2.94K 41 189M 788K mirror 144G 320G578 6 39.2M 166K c1t1d0 - -379 5 39.9M 166K c1t2d0 - -379 5 39.9M 166K c2t1d0 - -385 5 40.1M 166K mirror 147G 317G633 2 37.8M 170K c1t3d0 - -389 2 38.7M 171K c2t2d0 - -393 2 38.9M 171K c2t0d0 - -384 2 38.9M 171K mirror 147G 317G619 6 37.3M 57.5K c2t3d0 - -377 2 38.3M 57.9K c1t5d0 - -377 2 38.3M 57.9K c1t4d0 - -373 3 38.2M 57.9K mirror 148G 316G638 10 37.6M 64.0K c2t4d0 - -375 4 38.5M 64.4K c2t5d0 - -386 6 38.2M 64.4K c1t6d0 - -384 6 38.2M 64.4K mirror 149G 315G540 6 37.4M 164K c1t7d0 - -356 4 38.1M 164K c2t6d0 - -362 5 38.2M 164K c2t7d0 - -361 5 38.2M 164K c3d1p0 12K 504M 0 8 0 166K -- - - - - - - During a send (30s intervals): capacity operationsbandwidth pool used avail read write read write -- - - - - - - rc-pool 734G 1.55T148 55 18.6M 1.71M mirror 144G 320G 25 6 3.15M 235K c1t1d0 - - 8 3 1.02M 235K c1t2d0 - - 7 3 954K 235K c2t1d0 - - 9 3 1.19M 235K mirror 147G 317G 27 3 3.40M 203K c1t3d0 - - 8 2 1.03M 203K c2t2d0 - - 9 3 1.25M 203K c2t0d0 - - 8 2 1.11M 203K mirror 147G 317G 32 2 4.12M 205K c2t3d0 - - 11 1 1.45M 205K c1t5d0 - - 10 1 1.34M 205K c1t4d0 - - 10 1 1.34M 205K mirror 148G 316G 32 2 4.02M 201K c2t4d0 - - 10 1 1.37M 201K c2t5d0 - - 9 1 1.23M 201K c1t6d0 - - 11 1 1.43M 201K mirror 149G 315G 31 6 3.89M 180K c1t7d0 - - 11 2 1.45M 180K c2t6d0 - - 8 2 1.10M 180K c2t7d0 - - 10 2 1.35M 180K c3d1p0 12K 504M 0 34 0 727K -- - - - - - - Can anybody explain why zfs send could be so slow on one server? Is anybody else able to compare their iostat results for a zfs send and zpool scrub to see if they also have such a huge difference between the figures? thanks, Ross -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Improving zfs send performance
Hi Ross, On Fri, Oct 17, 2008 at 1:35 PM, Ross [EMAIL PROTECTED] wrote: Ok, just did some more testing on this machine to try to find where my bottlenecks are. Something very odd is going on here. As best I can tell there are two separate problems now: - something is throttling network output to 10MB/s I'll try to help you with this problem. The network throughput I've verified with mbuffer: 1. A quick mbuffer test from /dev/zero to /dev/null gave me 565MB/s. 2. On a test server, mbuffer sending from /dev/zero on one machine to /dev/null on another gave me 37MB/s 3. On the live server, mbuffer sending from /dev/zero to the same receiving machine gave me just under 10MB/s. This looks very much like mbuffer is throttled on this machine, but I know NFS can give me 60-80MB/s. Can anybody give me a clue as to what could be causing this? Does your NFS mount go over a separate network? If not, just ignore this advice. :) When first testing out ZFS over NFS performance, I ran into a similar problem. I had very nice graphs, all plateauing at 10MB/s, and was getting frustrated at performance being so slow. It turned out that one of my links was 100Mbit. I took a moment to breathe, learn from my mistake (check the network links BEFORE running performance tests), and ran my tests again. Check your network links, make sure that it's Gigabit all the way through, and that you're negotiating full-duplex. A 100Mbit link will give you just about 10MB/s throughput on network transfers. - Dimitri ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Improving zfs send performance
Yup, that's one of the first things I checked when it came out with figures so close to 10MB/s. All three servers are running full duplex gigabit though, as reported by both Solaris and the switch. And both the NFS at 60+MB/s, and the zfs send / receive are all going over the same network link, in some cases to the same servers. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Improving zfs send performance
Hi All, I have opened a ticket with sun support #66104157 regarding zfs send / receive and will let you know what I find out. Keep in mind that this is for Solaris 10 not opensolaris. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Improving zfs send performance
r == Ross [EMAIL PROTECTED] writes: r figures so close to 10MB/s. All three servers are running r full duplex gigabit though there is one tricky way 100Mbit/s could still bite you, but it's probably not happening to you. It mostly affects home users with unmanaged switches: http://www.smallnetbuilder.com/content/view/30212/54/ http://virtualthreads.blogspot.com/2006/02/beware-ethernet-flow-control.html because the big switch vendors all use pause frames safely: http://www.networkworld.com/netresources/0913flow2.html -- pause frames as interpreted by netgear are harmful pgpUnfS5B76nY.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Improving zfs send performance
Scott Williamson wrote: Hi All, I have opened a ticket with sun support #66104157 regarding zfs send / receive and will let you know what I find out. Thanks. Keep in mind that this is for Solaris 10 not opensolaris. Keep in mind that any changes required for Solaris 10 will first be available in OpenSolaris, including any changes which may have already been implemented. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Improving zfs send performance
On Fri, Oct 17, 2008 at 2:48 PM, Richard Elling [EMAIL PROTECTED]wrote: Keep in mind that any changes required for Solaris 10 will first be available in OpenSolaris, including any changes which may have already been implemented. For me (who uses SOL10) it is the only way I can get information about what bugs and changes have been identified and helps me get stuff from opensolaris into sol10. The last support ticket resulted in a solaris iSCSI target to windows initiator patch to solaris 10 that made iSCSI targets on ZFS actually work for us. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Improving zfs send performance
Ok, I'm not entirely sure this is the same problem, but it does sound fairly similar. Apologies for hijacking the thread if this does turn out to be something else. After following the advice here to get mbuffer working with zfs send / receive, I found I was only getting around 10MB/s throughput. Thinking it was a network problem I started the below thread in the OpenSolaris help forum: http://www.opensolaris.org/jive/thread.jspa?messageID=294846 Now though I don't think it's network at all. The end result from that thread is that we can't see any errors in the network setup, and using nicstat and NFS I can show that the server is capable of 50-60MB/s over the gigabit link. Nicstat also shows clearly that both zfs send / receive and mbuffer are only sending 1/5 of that amount of data over the network. I've completely run out of ideas of my own (but I do half expect there's a simple explanation I haven't thought of). Can anybody think of a reason why both zfs send / receive and mbuffer would be so slow? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Improving zfs send performance
Try to separate the two things: (1) Try /dev/zero - mbuffer --- network --- mbuffer /dev/null That should give you wirespeed I tried that already. It still gets just 10-11MB/s from this server. I can get zfs send / receive and mbuffer working at 30MB/s though from a couple of test servers (with much lower specs). (2) Try zfs send | mbuffer /dev/null That should give you an idea how fast zfs send really is locally. Hmm, that's better than 10MB/s, but the average is still only around 20MB/s: summary: 942 MByte in 47.4 sec - average of 19.9 MB/s I think that points to another problem though as the send mbuffer is 100% full. Certainly the pool itself doesn't appear under any strain at all while this is going on: capacity operationsbandwidthpool used avail read write read write-- - - - - - -rc-pool 732G 1.55T171 85 21.3M 1.01M mirror 144G 320G 38 0 4.78M 0c1t1d0 - - 6 0 779K 0c1t2d0 - - 17 0 2.17M 0c2t1d0 - - 14 0 1.85M 0 mirror 146G 318G 39 0 4.89M 0c1t3d0 - - 20 0 2.50M 0c2t2d0 - - 13 0 1.63M 0c2t0d0 - - 6 0 779K 0 mirror 146G 318G 34 0 4.35M 0c2t3d0 - - 19 0 2.39M 0c1t5d0 - - 7 0 1002K 0 c1t4d0 - - 7 0 1002K 0 mirror 148G 316G 23 0 2.93M 0c2t4d0 - - 8 0 1.09M 0 c2t5d0 - - 6 0 890K 0c1t6d0 - - 7 0 1002K 0 mirror 148G 316G 35 0 4.35M 0 c1t7d0 - - 6 0 779K 0c2t6d0 - - 12 0 1.52M 0c2t7d0 - - 17 0 2.07M 0 c3d1p0 12K 504M 0 85 0 1.01M-- - - - - - - Especially when compared to the zfs send stats on my backup server which managed 30MB/s via mbuffer (Being received on a single virtual SATA disk): capacity operationsbandwidthpool used avail read write read write-- - - - - - -rpool 5.12G 42.6G 0 5 0 27.1K c4t0d0s0 5.12G 42.6G 0 5 0 27.1K-- - - - - - -zfspool 431G 4.11T261 0 31.4M 0 raidz2 431G 4.11T261 0 31.4M 0c4t1d0 - -155 0 6.28M 0c4t2d0 - -155 0 6.27M 0c4t3d0 - -155 0 6.27M 0c4t4d0 - -155 0 6.27M 0c4t5d0 - -155 0 6.27M 0-- - - - - - - The really ironic thing is that the 30MB/s send / receive was sending to a virtual SATA disk which is stored (via sync NFS) on the server I'm having problems with... Ross _ Win New York holidays with Kellogg’s Live Search http://clk.atdmt.com/UKM/go/111354033/direct/01/___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Improving zfs send performance
Oh dear god. Sorry folks, it looks like the new hotmail really doesn't play well with the list. Trying again in plain text: Try to separate the two things: (1) Try /dev/zero - mbuffer --- network --- mbuffer /dev/null That should give you wirespeed I tried that already. It still gets just 10-11MB/s from this server. I can get zfs send / receive and mbuffer working at 30MB/s though from a couple of test servers (with much lower specs). (2) Try zfs send | mbuffer /dev/null That should give you an idea how fast zfs send really is locally. Hmm, that's better than 10MB/s, but the average is still only around 20MB/s: summary: 942 MByte in 47.4 sec - average of 19.9 MB/s I think that points to another problem though as the send mbuffer is 100% full. Certainly the pool itself doesn't appear under any strain at all while this is going on: capacity operationsbandwidth pool used avail read write read write -- - - - - - - rc-pool 732G 1.55T171 85 21.3M 1.01M mirror 144G 320G 38 0 4.78M 0 c1t1d0 - - 6 0 779K 0 c1t2d0 - - 17 0 2.17M 0 c2t1d0 - - 14 0 1.85M 0 mirror 146G 318G 39 0 4.89M 0 c1t3d0 - - 20 0 2.50M 0 c2t2d0 - - 13 0 1.63M 0 c2t0d0 - - 6 0 779K 0 mirror 146G 318G 34 0 4.35M 0 c2t3d0 - - 19 0 2.39M 0 c1t5d0 - - 7 0 1002K 0 c1t4d0 - - 7 0 1002K 0 mirror 148G 316G 23 0 2.93M 0 c2t4d0 - - 8 0 1.09M 0 c2t5d0 - - 6 0 890K 0 c1t6d0 - - 7 0 1002K 0 mirror 148G 316G 35 0 4.35M 0 c1t7d0 - - 6 0 779K 0 c2t6d0 - - 12 0 1.52M 0 c2t7d0 - - 17 0 2.07M 0 c3d1p0 12K 504M 0 85 0 1.01M -- - - - - - - Especially when compared to the zfs send stats on my backup server which managed 30MB/s via mbuffer (Being received on a single virtual SATA disk): capacity operationsbandwidth pool used avail read write read write -- - - - - - - rpool 5.12G 42.6G 0 5 0 27.1K c4t0d0s0 5.12G 42.6G 0 5 0 27.1K -- - - - - - - zfspool 431G 4.11T261 0 31.4M 0 raidz2 431G 4.11T261 0 31.4M 0 c4t1d0 - -155 0 6.28M 0 c4t2d0 - -155 0 6.27M 0 c4t3d0 - -155 0 6.27M 0 c4t4d0 - -155 0 6.27M 0 c4t5d0 - -155 0 6.27M 0 -- - - - - - - The really ironic thing is that the 30MB/s send / receive was sending to a virtual SATA disk which is stored (via sync NFS) on the server I'm having problems with... Ross Date: Thu, 16 Oct 2008 14:27:49 +0200 From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] CC: zfs-discuss@opensolaris.org Subject: Re: [zfs-discuss] Improving zfs send performance Hi Ross Ross wrote: Now though I don't think it's network at all. The end result from that thread is that we can't see any errors in the network setup, and using nicstat and NFS I can show that the server is capable of 50-60MB/s over the gigabit link. Nicstat also shows clearly that both zfs send / receive and mbuffer are only sending 1/5 of that amount of data over the network. I've completely run out of ideas of my own (but I do half expect there's a simple explanation I haven't thought of). Can anybody think of a reason why both zfs send / receive and mbuffer would be so slow? Try to separate the two things: (1) Try /dev/zero - mbuffer --- network --- mbuffer /dev/null That should give you wirespeed (2) Try zfs send | mbuffer /dev/null That should give you an idea how fast zfs send really is locally. Carsten _ Get all your favourite content with the slick new MSN Toolbar - FREE http://clk.atdmt.com/UKM/go/111354027/direct/01/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Improving zfs send performance
Hi Scott, Scott Williamson wrote: You seem to be using dd for write testing. In my testing I noted that there was a large difference in write speed between using dd to write from /dev/zero and using other files. Writing from /dev/zero always seemed to be fast, reaching the maximum of ~200MB/s and using cp which would perform poorler the fewer the vdevs. You are right, the write benchmarks were done with dd just to have some bulk bulk figures since usually zeros can be generated fast enough. This also impacted the zfs send speed, as with fewer vdevs in RaidZ2 the disks seemed to spend most of their time seeking during the send. That seems a bit too simplistic to me. If you compare raidz with raidz2 it seems that raidz2 is not too bad with fewer vdevs. I wish there was a way for zfs send to avoid so many seeks. The 1 TB file system is still being zfs send, now close to 48 hours. Cheers Carsten PS: We still have a spare thumper sitting around, maybe I give it a try with 5 vdevs ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Improving zfs send performance
Hi Ross Ross wrote: Now though I don't think it's network at all. The end result from that thread is that we can't see any errors in the network setup, and using nicstat and NFS I can show that the server is capable of 50-60MB/s over the gigabit link. Nicstat also shows clearly that both zfs send / receive and mbuffer are only sending 1/5 of that amount of data over the network. I've completely run out of ideas of my own (but I do half expect there's a simple explanation I haven't thought of). Can anybody think of a reason why both zfs send / receive and mbuffer would be so slow? Try to separate the two things: (1) Try /dev/zero - mbuffer --- network --- mbuffer /dev/null That should give you wirespeed (2) Try zfs send | mbuffer /dev/null That should give you an idea how fast zfs send really is locally. Carsten ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Improving zfs send performance
Hi Carsten, You seem to be using dd for write testing. In my testing I noted that there was a large difference in write speed between using dd to write from /dev/zero and using other files. Writing from /dev/zero always seemed to be fast, reaching the maximum of ~200MB/s and using cp which would perform poorler the fewer the vdevs. This also impacted the zfs send speed, as with fewer vdevs in RaidZ2 the disks seemed to spend most of their time seeking during the send. On Thu, Oct 16, 2008 at 1:27 AM, Carsten Aulbert [EMAIL PROTECTED] wrote: Some time ago I made some tests to find this: (1) create a new zpool (2) Copy user's home to it (always the same ~ 25 GB IIRC) (3) zfs send to /dev/null (4) evaluate continue loop I did this for fully mirrored setups, raidz as well as raidz2, the results were mixed: https://n0.aei.uni-hannover.de/cgi-bin/twiki/view/ATLAS/ZFSBenchmarkTest#ZFS_send_performance_relevant_fo The culprit here might be that in retrospect this seemed like a good home filesystem, i.e. one which was quite fast. If you don't want to bother with the table: Mirrored setup never exceeded 58 MB/s and was getting faster the more small mirrors you used. RaidZ had its sweetspot with a configuration of '6 6 6 6 6 6 5 5', i.e. 6 or 5 disks per RaidZ and 8 vdevs RaidZ2 finally was best at '10 9 9 9 9', i.e. 5 vdevs but not much worse with only 3, i.e. what we are currently using to get more storage space (gains us about 2 TB/box). Cheers Carsten ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Improving zfs send performance
I'm using 2008-05-07 (latest stable), am I right in assuming that one is ok? Date: Wed, 15 Oct 2008 13:52:42 +0200 From: [EMAIL PROTECTED] To: [EMAIL PROTECTED]; zfs-discuss@opensolaris.org Subject: Re: [zfs-discuss] Improving zfs send performance Thomas Maier-Komor schrieb: BTW: I release a new version of mbuffer today. WARNING!!! Sorry people!!! The latest version of mbuffer has a regression that can CORRUPT output if stdout is used. Please fall back to the last version. A fix is on the way... - Thomas _ Discover Bird's Eye View now with Multimap from Live Search http://clk.atdmt.com/UKM/go/111354026/direct/01/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Improving zfs send performance
Hi, I'm just doing my first proper send/receive over the network and I'm getting just 9.4MB/s over a gigabit link. Would you be able to provide an example of how to use mbuffer / socat with ZFS for a Solaris beginner? thanks, Ross -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Improving zfs send performance
Hi all, Carsten Aulbert wrote: More later. OK, I'm completely puzzled right now (and sorry for this lengthy email). My first (and currently only idea) was that the size of the files is related to this effect, but that does not seem to be the case: (1) A 185 GB zfs file system was transferred yesterday with a speed of about 60 MB/s to two different servers. The histogram of files looks like: 2822 files were investigated, total size is: 185.82 Gbyte Summary of file sizes [bytes]: zero: 2 1 - 2 0 2 - 4 1 4 - 8 3 8 - 16 26 16 - 32 8 32 - 64 6 64 - 128 29 128 - 25611 256 - 51213 512 - 1024 17 1024 - 2k33 2k - 4k 45 4k - 8k9044 8k - 16k 60 16k - 32k41 32k - 64k19 64k - 128k 22 128k - 256k 12 256k - 512k 5 512k - 1024k 1218 ** 1024k - 2M16004 * 2M - 4M 46202 4M - 8M 0 8M - 16M 0 16M - 32M 0 32M - 64M 0 64M - 128M0 128M - 256M 0 256M - 512M 0 512M - 1024M 0 1024M - 2G0 2G - 4G 0 4G - 8G 0 8G - 16G 1 (2) Currently a much larger file system is being transferred, the same script (even the same incarnation, i.e. process) is now running close to 22 hours: 28549 files were investigated, total size is: 646.67 Gbyte Summary of file sizes [bytes]: zero: 4954 ** 1 - 2 0 2 - 4 0 4 - 8 1 8 - 161 16 - 32 0 32 - 64 0 64 - 128 1 128 - 256 0 256 - 512 9 512 - 1024 71 1024 - 2k 1 2k - 4k1095 ** 4k - 8k8449 * 8k - 16k 2217 16k - 32k 503 *** 32k - 64k 1 64k - 128k1 128k - 256k 1 256k - 512k 0 512k - 1024k 0 1024k - 2M0 2M - 4M 0 4M - 8M 16 8M - 16M 0 16M - 32M 0 32M - 64M 11218 64M - 128M0 128M - 256M 0 256M - 512M 0 512M - 1024M 0 1024M - 2G0 2G - 4G 5 4G - 8G 1 8G - 16G 3 16G - 32G 1 When watching zpool iostat I get this (30 second average, NOT the first output): capacity operationsbandwidth pool used avail read write read write -- - - - - - - atlashome 3.54T 17.3T137 0 4.28M 0 raidz2 833G 6.00T 1 0 30.8K 0 c0t0d0 - - 1 0 2.38K 0 c1t0d0 - - 1 0 2.18K 0 c4t0d0 - - 0 0 1.91K 0 c6t0d0 - - 0 0 1.76K 0 c7t0d0 - - 0 0 1.77K 0 c0t1d0 - - 0 0 1.79K 0 c1t1d0 - - 0 0 1.86K 0 c4t1d0 - - 0 0 1.97K 0 c5t1d0 - - 0 0 2.04K 0 c6t1d0 - - 1 0 2.25K 0 c7t1d0 - - 1 0 2.31K 0 c0t2d0 - - 1 0 2.21K 0 c1t2d0 - - 0 0 1.99K 0 c4t2d0 - - 0 0 1.99K 0 c5t2d0 - - 1 0 2.38K 0 raidz21.29T 5.52T 67 0 2.09M 0 c6t2d0 - - 58 0 143K 0 c7t2d0 - - 58 0 141K 0 c0t3d0 - - 53 0 131K 0 c1t3d0 - - 53 0 130K 0 c4t3d0 - - 58 0 143K 0 c5t3d0 - - 58 0 145K 0 c6t3d0 - - 59 0 147K 0 c7t3d0 - - 59 0 146K 0 c0t4d0 - - 59 0 145K 0 c1t4d0 - - 58 0 145K 0 c4t4d0 - - 58 0 145K 0 c6t4d0 - - 58 0 143K 0 c7t4d0 - - 58 0 143K 0 c0t5d0 - - 58 0 145K 0 c1t5d0 - - 58 0 144K 0 raidz21.43T 5.82T 69 0 2.16M 0 c4t5d0 - - 62 0 141K 0 c5t5d0 - - 60 0 138K 0 c6t5d0 - - 59 0 135K 0 c7t5d0 - - 60 0 138K 0 c0t6d0 - - 62
Re: [zfs-discuss] Improving zfs send performance
Thomas Maier-Komor schrieb: BTW: I release a new version of mbuffer today. WARNING!!! Sorry people!!! The latest version of mbuffer has a regression that can CORRUPT output if stdout is used. Please fall back to the last version. A fix is on the way... - Thomas ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Improving zfs send performance
Ross Smith schrieb: I'm using 2008-05-07 (latest stable), am I right in assuming that one is ok? Date: Wed, 15 Oct 2008 13:52:42 +0200 From: [EMAIL PROTECTED] To: [EMAIL PROTECTED]; zfs-discuss@opensolaris.org Subject: Re: [zfs-discuss] Improving zfs send performance Thomas Maier-Komor schrieb: BTW: I release a new version of mbuffer today. WARNING!!! Sorry people!!! The latest version of mbuffer has a regression that can CORRUPT output if stdout is used. Please fall back to the last version. A fix is on the way... - Thomas _ Discover Bird's Eye View now with Multimap from Live Search http://clk.atdmt.com/UKM/go/111354026/direct/01/ Yes this one is OK. The regression appeared in 20081014. - Thomas ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Improving zfs send performance
Thanks, that got it working. I'm still only getting 10MB/s, so it's not solved my problem - I've still got a bottleneck somewhere, but mbuffer is a huge improvement over standard zfs send / receive. It makes such a difference when you can actually see what's going on. Date: Wed, 15 Oct 2008 12:08:14 +0200 From: [EMAIL PROTECTED] To: [EMAIL PROTECTED]; zfs-discuss@opensolaris.org Subject: Re: [zfs-discuss] Improving zfs send performance Ross schrieb: Hi, I'm just doing my first proper send/receive over the network and I'm getting just 9.4MB/s over a gigabit link. Would you be able to provide an example of how to use mbuffer / socat with ZFS for a Solaris beginner? thanks, Ross -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss receiver mbuffer -I sender:1 -s 128k -m 512M | zfs receive sender zfs send mypool/[EMAIL PROTECTED] | mbuffer -s 128k -m 512M -O receiver:1 BTW: I release a new version of mbuffer today. HTH, Thomas _ Make a mini you and download it into Windows Live Messenger http://clk.atdmt.com/UKM/go/111354029/direct/01/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Improving zfs send performance
Hi Ross Ross Smith wrote: Thanks, that got it working. I'm still only getting 10MB/s, so it's not solved my problem - I've still got a bottleneck somewhere, but mbuffer is a huge improvement over standard zfs send / receive. It makes such a difference when you can actually see what's going on. I'm currently trying to investigate this a bit. One of our user's home directories is extremely slow to 'zfs send'. It started yesterday afternoon at about 1600+0200 and is still running and has only copied less than 50% of the whole tree: On the receiving side zfs get tells me: atlashome/BACKUP/XXX used 193G - atlashome/BACKUP/XXX available 17.2T - atlashome/BACKUP/XXX referenced 193G - atlashome/BACKUP/XXX compressratio 1.81x - So close 350 GB are transferred and about 500 GB to go. More later. Carsten ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Improving zfs send performance
Hello all, I think in SS 11 should be -xarch=amd64. Leal. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Improving zfs send performance
Ross schrieb: Hi, I'm just doing my first proper send/receive over the network and I'm getting just 9.4MB/s over a gigabit link. Would you be able to provide an example of how to use mbuffer / socat with ZFS for a Solaris beginner? thanks, Ross -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss receiver mbuffer -I sender:1 -s 128k -m 512M | zfs receive sender zfs send mypool/[EMAIL PROTECTED] | mbuffer -s 128k -m 512M -O receiver:1 BTW: I release a new version of mbuffer today. HTH, Thomas ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Improving zfs send performance
comments below... Carsten Aulbert wrote: Hi all, Carsten Aulbert wrote: More later. OK, I'm completely puzzled right now (and sorry for this lengthy email). My first (and currently only idea) was that the size of the files is related to this effect, but that does not seem to be the case: (1) A 185 GB zfs file system was transferred yesterday with a speed of about 60 MB/s to two different servers. The histogram of files looks like: 2822 files were investigated, total size is: 185.82 Gbyte Summary of file sizes [bytes]: zero: 2 1 - 2 0 2 - 4 1 4 - 8 3 8 - 16 26 16 - 32 8 32 - 64 6 64 - 128 29 128 - 25611 256 - 51213 512 - 1024 17 1024 - 2k33 2k - 4k 45 4k - 8k9044 8k - 16k 60 16k - 32k41 32k - 64k19 64k - 128k 22 128k - 256k 12 256k - 512k 5 512k - 1024k 1218 ** 1024k - 2M16004 * 2M - 4M 46202 4M - 8M 0 8M - 16M 0 16M - 32M 0 32M - 64M 0 64M - 128M0 128M - 256M 0 256M - 512M 0 512M - 1024M 0 1024M - 2G0 2G - 4G 0 4G - 8G 0 8G - 16G 1 (2) Currently a much larger file system is being transferred, the same script (even the same incarnation, i.e. process) is now running close to 22 hours: 28549 files were investigated, total size is: 646.67 Gbyte Summary of file sizes [bytes]: zero: 4954 ** 1 - 2 0 2 - 4 0 4 - 8 1 8 - 161 16 - 32 0 32 - 64 0 64 - 128 1 128 - 256 0 256 - 512 9 512 - 1024 71 1024 - 2k 1 2k - 4k1095 ** 4k - 8k8449 * 8k - 16k 2217 16k - 32k 503 *** 32k - 64k 1 64k - 128k1 128k - 256k 1 256k - 512k 0 512k - 1024k 0 1024k - 2M0 2M - 4M 0 4M - 8M 16 8M - 16M 0 16M - 32M 0 32M - 64M 11218 64M - 128M0 128M - 256M 0 256M - 512M 0 512M - 1024M 0 1024M - 2G0 2G - 4G 5 4G - 8G 1 8G - 16G 3 16G - 32G 1 When watching zpool iostat I get this (30 second average, NOT the first output): capacity operationsbandwidth pool used avail read write read write -- - - - - - - atlashome 3.54T 17.3T137 0 4.28M 0 raidz2 833G 6.00T 1 0 30.8K 0 c0t0d0 - - 1 0 2.38K 0 c1t0d0 - - 1 0 2.18K 0 c4t0d0 - - 0 0 1.91K 0 c6t0d0 - - 0 0 1.76K 0 c7t0d0 - - 0 0 1.77K 0 c0t1d0 - - 0 0 1.79K 0 c1t1d0 - - 0 0 1.86K 0 c4t1d0 - - 0 0 1.97K 0 c5t1d0 - - 0 0 2.04K 0 c6t1d0 - - 1 0 2.25K 0 c7t1d0 - - 1 0 2.31K 0 c0t2d0 - - 1 0 2.21K 0 c1t2d0 - - 0 0 1.99K 0 c4t2d0 - - 0 0 1.99K 0 c5t2d0 - - 1 0 2.38K 0 raidz21.29T 5.52T 67 0 2.09M 0 c6t2d0 - - 58 0 143K 0 c7t2d0 - - 58 0 141K 0 c0t3d0 - - 53 0 131K 0 c1t3d0 - - 53 0 130K 0 c4t3d0 - - 58 0 143K 0 c5t3d0 - - 58 0 145K 0 c6t3d0 - - 59 0 147K 0 c7t3d0 - - 59 0 146K 0 c0t4d0 - - 59 0 145K 0 c1t4d0 - - 58 0 145K 0 c4t4d0 - - 58 0 145K 0 c6t4d0 - - 58 0 143K 0 c7t4d0 - - 58 0 143K 0 c0t5d0 - - 58 0 145K 0 c1t5d0 - - 58 0 144K 0 raidz21.43T 5.82T 69 0 2.16M 0 c4t5d0 - - 62 0 141K 0 c5t5d0
Re: [zfs-discuss] Improving zfs send performance
Hi Richard, Richard Elling wrote: Since you are reading, it depends on where the data was written. Remember, ZFS dynamic striping != RAID-0. I would expect something like this if the pool was expanded at some point in time. No, the RAID was set-up in one go right after jumpstarting the box. (2) The disks should be able to perform much much faster than they currently output data at, I believe it;s 2008 and not 1995. X4500? Those disks are good for about 75-80 random iops, which seems to be about what they are delivering. The dtrace tool, iopattern, will show the random/sequential nature of the workload. I need to read about his a bit and will try to analyze it. (3) The four cores of the X4500 are dying of boredom, i.e. idle 95% all the time. Has anyone a good idea, where the bottleneck could be? I'm running out of ideas. I would suspect the disks. 30 second samples are not very useful to try and debug such things -- even 1 second samples can be too coarse. But you should take a look at 1 second samples to see if there is a consistent I/O workload. -- richard Without doing too much statistics (yet, if needed I can easily do that) it looks like these: capacity operationsbandwidth pool used avail read write read write -- - - - - - - atlashome 3.54T 17.3T256 0 7.97M 0 raidz2 833G 6.00T 0 0 0 0 c0t0d0 - - 0 0 0 0 c1t0d0 - - 0 0 0 0 c4t0d0 - - 0 0 0 0 c6t0d0 - - 0 0 0 0 c7t0d0 - - 0 0 0 0 c0t1d0 - - 0 0 0 0 c1t1d0 - - 0 0 0 0 c4t1d0 - - 0 0 0 0 c5t1d0 - - 0 0 0 0 c6t1d0 - - 0 0 0 0 c7t1d0 - - 0 0 0 0 c0t2d0 - - 0 0 0 0 c1t2d0 - - 0 0 0 0 c4t2d0 - - 0 0 0 0 c5t2d0 - - 0 0 0 0 raidz21.29T 5.52T133 0 4.14M 0 c6t2d0 - -117 0 285K 0 c7t2d0 - -114 0 279K 0 c0t3d0 - -106 0 261K 0 c1t3d0 - -114 0 282K 0 c4t3d0 - -118 0 294K 0 c5t3d0 - -125 0 308K 0 c6t3d0 - -126 0 311K 0 c7t3d0 - -118 0 293K 0 c0t4d0 - -119 0 295K 0 c1t4d0 - -120 0 298K 0 c4t4d0 - -120 0 291K 0 c6t4d0 - -106 0 257K 0 c7t4d0 - - 96 0 236K 0 c0t5d0 - -109 0 267K 0 c1t5d0 - -114 0 282K 0 raidz21.43T 5.82T123 0 3.83M 0 c4t5d0 - -108 0 242K 0 c5t5d0 - -104 0 236K 0 c6t5d0 - -104 0 239K 0 c7t5d0 - -107 0 245K 0 c0t6d0 - -108 0 248K 0 c1t6d0 - -106 0 245K 0 c4t6d0 - -108 0 250K 0 c5t6d0 - -112 0 258K 0 c6t6d0 - -114 0 261K 0 c7t6d0 - -110 0 253K 0 c0t7d0 - -109 0 248K 0 c1t7d0 - -109 0 246K 0 c4t7d0 - -108 0 243K 0 c5t7d0 - -108 0 244K 0 c6t7d0 - -106 0 240K 0 c7t7d0 - -109 0 244K 0 -- - - - - - - the iops vary between about 70 - 140, interesting bit is that the first raidz2 does not get any hits at all :( Cheers Carsten ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Improving zfs send performance
Hi All, Just want to note that I had the same issue with zfs send + vdevs that had 11 drives in them on a X4500. Reducing the count of drives per zvol cleared this up. One vdev is IOPS limited to the speed of one drive in that vdev, according to this post http://opensolaris.org/jive/thread.jspa?threadID=74033 (see comment from ptribble.) On Wed, Oct 15, 2008 at 3:07 PM, Carsten Aulbert [EMAIL PROTECTED] wrote: Hi Richard, Richard Elling wrote: Since you are reading, it depends on where the data was written. Remember, ZFS dynamic striping != RAID-0. I would expect something like this if the pool was expanded at some point in time. No, the RAID was set-up in one go right after jumpstarting the box. (2) The disks should be able to perform much much faster than they currently output data at, I believe it;s 2008 and not 1995. X4500? Those disks are good for about 75-80 random iops, which seems to be about what they are delivering. The dtrace tool, iopattern, will show the random/sequential nature of the workload. I need to read about his a bit and will try to analyze it. (3) The four cores of the X4500 are dying of boredom, i.e. idle 95% all the time. Has anyone a good idea, where the bottleneck could be? I'm running out of ideas. I would suspect the disks. 30 second samples are not very useful to try and debug such things -- even 1 second samples can be too coarse. But you should take a look at 1 second samples to see if there is a consistent I/O workload. -- richard Without doing too much statistics (yet, if needed I can easily do that) it looks like these: capacity operationsbandwidth pool used avail read write read write -- - - - - - - atlashome 3.54T 17.3T256 0 7.97M 0 raidz2 833G 6.00T 0 0 0 0 c0t0d0 - - 0 0 0 0 c1t0d0 - - 0 0 0 0 c4t0d0 - - 0 0 0 0 c6t0d0 - - 0 0 0 0 c7t0d0 - - 0 0 0 0 c0t1d0 - - 0 0 0 0 c1t1d0 - - 0 0 0 0 c4t1d0 - - 0 0 0 0 c5t1d0 - - 0 0 0 0 c6t1d0 - - 0 0 0 0 c7t1d0 - - 0 0 0 0 c0t2d0 - - 0 0 0 0 c1t2d0 - - 0 0 0 0 c4t2d0 - - 0 0 0 0 c5t2d0 - - 0 0 0 0 raidz21.29T 5.52T133 0 4.14M 0 c6t2d0 - -117 0 285K 0 c7t2d0 - -114 0 279K 0 c0t3d0 - -106 0 261K 0 c1t3d0 - -114 0 282K 0 c4t3d0 - -118 0 294K 0 c5t3d0 - -125 0 308K 0 c6t3d0 - -126 0 311K 0 c7t3d0 - -118 0 293K 0 c0t4d0 - -119 0 295K 0 c1t4d0 - -120 0 298K 0 c4t4d0 - -120 0 291K 0 c6t4d0 - -106 0 257K 0 c7t4d0 - - 96 0 236K 0 c0t5d0 - -109 0 267K 0 c1t5d0 - -114 0 282K 0 raidz21.43T 5.82T123 0 3.83M 0 c4t5d0 - -108 0 242K 0 c5t5d0 - -104 0 236K 0 c6t5d0 - -104 0 239K 0 c7t5d0 - -107 0 245K 0 c0t6d0 - -108 0 248K 0 c1t6d0 - -106 0 245K 0 c4t6d0 - -108 0 250K 0 c5t6d0 - -112 0 258K 0 c6t6d0 - -114 0 261K 0 c7t6d0 - -110 0 253K 0 c0t7d0 - -109 0 248K 0 c1t7d0 - -109 0 246K 0 c4t7d0 - -108 0 243K 0 c5t7d0 - -108 0 244K 0 c6t7d0 - -106 0 240K 0 c7t7d0 - -109 0 244K 0 -- - - - - - - the iops vary between about 70 - 140, interesting bit is that the first raidz2 does not get any hits at all :( Cheers Carsten ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Improving zfs send performance
On Wed, Oct 15, 2008 at 2:17 PM, Scott Williamson [EMAIL PROTECTED] wrote: Hi All, Just want to note that I had the same issue with zfs send + vdevs that had 11 drives in them on a X4500. Reducing the count of drives per zvol cleared this up. One vdev is IOPS limited to the speed of one drive in that vdev, according to this post (see comment from ptribble.) Scott, Can you tell us the configuration that you're using that is working for you? Were you using RaidZ, or RaidZ2? I'm wondering what the sweetspot is to get a good compromise in vdevs and usable space/performance Thanks! -- Brent Jones [EMAIL PROTECTED] ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Improving zfs send performance
Hi again Brent Jones wrote: Scott, Can you tell us the configuration that you're using that is working for you? Were you using RaidZ, or RaidZ2? I'm wondering what the sweetspot is to get a good compromise in vdevs and usable space/performance Some time ago I made some tests to find this: (1) create a new zpool (2) Copy user's home to it (always the same ~ 25 GB IIRC) (3) zfs send to /dev/null (4) evaluate continue loop I did this for fully mirrored setups, raidz as well as raidz2, the results were mixed: https://n0.aei.uni-hannover.de/cgi-bin/twiki/view/ATLAS/ZFSBenchmarkTest#ZFS_send_performance_relevant_fo The culprit here might be that in retrospect this seemed like a good home filesystem, i.e. one which was quite fast. If you don't want to bother with the table: Mirrored setup never exceeded 58 MB/s and was getting faster the more small mirrors you used. RaidZ had its sweetspot with a configuration of '6 6 6 6 6 6 5 5', i.e. 6 or 5 disks per RaidZ and 8 vdevs RaidZ2 finally was best at '10 9 9 9 9', i.e. 5 vdevs but not much worse with only 3, i.e. what we are currently using to get more storage space (gains us about 2 TB/box). Cheers Carsten ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Improving zfs send performance
Carsten Aulbert schrieb: Hi Thomas, Thomas Maier-Komor wrote: Carsten, the summary looks like you are using mbuffer. Can you elaborate on what options you are passing to mbuffer? Maybe changing the blocksize to be consistent with the recordsize of the zpool could improve performance. Is the buffer running full or is it empty most of the time? Are you sure that the network connection is 10Gb/s all the way through from machine to machine? Well spotted :) right now plain mbuffer with plenty of buffer (-m 2048M) on both ends and I have not seen any buffer exceeding the 10% watermark level. The network connection are via Neterion XFrame II Sun Fire NICs then via CX4 cables to our core switch where both boxes are directly connected (WovenSystmes EFX1000). netperf tells me that the TCP performance is close to 7.5 GBit/s duplex and if I use cat /dev/zero | mbuffer | socat --- socat | mbuffer /dev/null I easily see speeds of about 350-400 MB/s so I think the network is fine. Cheers Carsten ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss I don't know socat or what benefit it gives you, but have you tried using mbuffer to send and receive directly (options -I and -O)? Additionally, try to set the block size of mbuffer to the recordsize of zfs (usually 128k): receiver$ mbuffer -I sender:1 -s 128k -m 2048M | zfs receive sender$ zfs send blabla | mbuffer -s 128k -m 2048M -O receiver:1 As transmitting from /dev/zero to /dev/null is at a rate of 350MB/s, I guess, you are really hitting the maximum speed of your zpool. From my understanding, I'd guess sending is always slower than receiving, because reads are random and writes are sequential. So it should be quite normal that mbuffer's buffer doesn't really see a lot of usage. Cheers, Thomas ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Improving zfs send performance
Hi again, Thomas Maier-Komor wrote: Carsten Aulbert schrieb: Hi Thomas, I don't know socat or what benefit it gives you, but have you tried using mbuffer to send and receive directly (options -I and -O)? I thought we tried that in the past and with socat it seemed faster, but I just made a brief test and I got (/dev/zero - remote /dev/null) 330 MB/s with mbuffer+socat and 430MB/s with mbuffer alone. Additionally, try to set the block size of mbuffer to the recordsize of zfs (usually 128k): receiver$ mbuffer -I sender:1 -s 128k -m 2048M | zfs receive sender$ zfs send blabla | mbuffer -s 128k -m 2048M -O receiver:1 We are using 32k since many of our user use tiny files (and then I need to reduce the buffer size because of this 'funny' error): mbuffer: fatal: Cannot address so much memory (32768*65536=21474836481544040742911). Does this qualify for a bug report? Thanks for the hint of looking into this again! Cheers Carsten ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Improving zfs send performance
Carsten Aulbert schrieb: Hi again, Thomas Maier-Komor wrote: Carsten Aulbert schrieb: Hi Thomas, I don't know socat or what benefit it gives you, but have you tried using mbuffer to send and receive directly (options -I and -O)? I thought we tried that in the past and with socat it seemed faster, but I just made a brief test and I got (/dev/zero - remote /dev/null) 330 MB/s with mbuffer+socat and 430MB/s with mbuffer alone. Additionally, try to set the block size of mbuffer to the recordsize of zfs (usually 128k): receiver$ mbuffer -I sender:1 -s 128k -m 2048M | zfs receive sender$ zfs send blabla | mbuffer -s 128k -m 2048M -O receiver:1 We are using 32k since many of our user use tiny files (and then I need to reduce the buffer size because of this 'funny' error): mbuffer: fatal: Cannot address so much memory (32768*65536=21474836481544040742911). Does this qualify for a bug report? Thanks for the hint of looking into this again! Cheers Carsten ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss Yes this qualifies for a bug report. As a workaround for now, you can compile in 64 bit mode. I.e.: $ ./configure CFLAGS=-g -O -m64 $ make make install This works for Sun Studio 12 and gcc. For older version of Sun Studio, you need to pass -xarch=v9 instead of -m64. I am planning to release an updated version mbuffer this week. I'll include a patch for this issue. Cheers, Thomas ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Improving zfs send performance
Carsten Aulbert wrote: Hi all, although I'm running all this in a Sol10u5 X4500, I hope I may ask this question here. If not, please let me know where to head to. We are running several X4500 with only 3 raidz2 zpools since we want quite a bit of storage space[*], but the performance we get when using zfs send is sometimes really lousy. Of course this depends what's in the file system, but when doing a few backups today I have seen the following: receiving full stream of atlashome/[EMAIL PROTECTED] into atlashome/BACKUP/[EMAIL PROTECTED] in @ 11.1 MB/s, out @ 11.1 MB/s, 14.9 GB total, buffer 0% full summary: 14.9 GByte in 45 min 42.8 sec - average of 5708 kB/s So, a mere 15 GB were transferred in 45 minutes, another user's home which is quite large (7TB) took more than 42 hours to be transferred. Since all this is going a 10 Gb/s network and the CPUs are all idle I would really like to know why What are you using to transfer the data over the network ? -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Improving zfs send performance
Carsten Aulbert schrieb: Hi all, although I'm running all this in a Sol10u5 X4500, I hope I may ask this question here. If not, please let me know where to head to. We are running several X4500 with only 3 raidz2 zpools since we want quite a bit of storage space[*], but the performance we get when using zfs send is sometimes really lousy. Of course this depends what's in the file system, but when doing a few backups today I have seen the following: receiving full stream of atlashome/[EMAIL PROTECTED] into atlashome/BACKUP/[EMAIL PROTECTED] in @ 11.1 MB/s, out @ 11.1 MB/s, 14.9 GB total, buffer 0% full summary: 14.9 GByte in 45 min 42.8 sec - average of 5708 kB/s So, a mere 15 GB were transferred in 45 minutes, another user's home which is quite large (7TB) took more than 42 hours to be transferred. Since all this is going a 10 Gb/s network and the CPUs are all idle I would really like to know why * zfs send is so slow and * how can I improve the speed? Thanks a lot for any hint Cheers Carsten [*] we have some quite a few tests with more zpools but were not able to improve the speeds substantially. For this particular bad file system I still need to histogram the file sizes. Carsten, the summary looks like you are using mbuffer. Can you elaborate on what options you are passing to mbuffer? Maybe changing the blocksize to be consistent with the recordsize of the zpool could improve performance. Is the buffer running full or is it empty most of the time? Are you sure that the network connection is 10Gb/s all the way through from machine to machine? - Thomas ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Improving zfs send performance
Hi Darren J Moffat wrote: What are you using to transfer the data over the network ? Initially just plain ssh which was way to slow, now we use mbuffer on both ends and socket transfer the data over via socat - I know that mbuffer already allows this, but in a few tests socat seemed to be faster. Sorry for not writing this into the first email. Cheers Carsten ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Improving zfs send performance
Hi Thomas, Thomas Maier-Komor wrote: Carsten, the summary looks like you are using mbuffer. Can you elaborate on what options you are passing to mbuffer? Maybe changing the blocksize to be consistent with the recordsize of the zpool could improve performance. Is the buffer running full or is it empty most of the time? Are you sure that the network connection is 10Gb/s all the way through from machine to machine? Well spotted :) right now plain mbuffer with plenty of buffer (-m 2048M) on both ends and I have not seen any buffer exceeding the 10% watermark level. The network connection are via Neterion XFrame II Sun Fire NICs then via CX4 cables to our core switch where both boxes are directly connected (WovenSystmes EFX1000). netperf tells me that the TCP performance is close to 7.5 GBit/s duplex and if I use cat /dev/zero | mbuffer | socat --- socat | mbuffer /dev/null I easily see speeds of about 350-400 MB/s so I think the network is fine. Cheers Carsten ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss