Re: [zfs-discuss] Improving zfs send performance

2008-11-12 Thread Roch

Thomas, for long latency fat links, it should be quite
beneficial to set the socket buffer on the receive side
(instead of having users tune tcp_recv_hiwat).

throughput of a tcp connnection is gated by 
receive socket buffer / round trip time.

Could that be Ross' problem ?

-r



Ross Smith writes:
  
  Thanks, that got it working.  I'm still only getting 10MB/s, so it's not 
  solved my problem - I've still got a bottleneck somewhere, but mbuffer is a 
  huge improvement over standard zfs send / receive.  It makes such a 
  difference when you can actually see what's going on.
  
  
  
   Date: Wed, 15 Oct 2008 12:08:14 +0200
   From: [EMAIL PROTECTED]
   To: [EMAIL PROTECTED]; zfs-discuss@opensolaris.org
   Subject: Re: [zfs-discuss] Improving zfs send performance
   
   Ross schrieb:
   Hi,
   
   I'm just doing my first proper send/receive over the network and I'm 
   getting just 9.4MB/s over a gigabit link.  Would you be able to provide 
   an example of how to use mbuffer / socat with ZFS for a Solaris beginner?
   
   thanks,
   
   Ross
   --
   This message posted from opensolaris.org
   ___
   zfs-discuss mailing list
   zfs-discuss@opensolaris.org
   http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
   
   receiver mbuffer -I sender:1 -s 128k -m 512M | zfs receive
   
   sender zfs send mypool/[EMAIL PROTECTED] | mbuffer -s 128k -m
   512M -O receiver:1
   
   BTW: I release a new version of mbuffer today.
   
   HTH,
   Thomas
  
  _
  Make a mini you and download it into Windows Live Messenger
  http://clk.atdmt.com/UKM/go/111354029/direct/01/
  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-11-12 Thread Thomas Maier-Komor
Roch schrieb:
 Thomas, for long latency fat links, it should be quite
 beneficial to set the socket buffer on the receive side
 (instead of having users tune tcp_recv_hiwat).
 
 throughput of a tcp connnection is gated by 
 receive socket buffer / round trip time.
 
 Could that be Ross' problem ?
 
 -r
 
 

Hmm, I'm not a TCP expert, but that sounds absolutely possible, if
Solaris 10 isn't tuning the TCP buffer automatically. The default
receive buffer seems to be 48k (at least on a V240 running 118833-33).
So if the block size is something like 128k it would absolutely make
sense to tune the receive buffer to lower the rund trip time...

Ross: Would you like a patch to test if this is the case? Which version
of mbuffer are you currently using?

- Thomas
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-20 Thread Victor Latushkin
Richard Elling пишет:
 Keep in mind that this is for Solaris 10 not opensolaris.
 
 Keep in mind that any changes required for Solaris 10 will first
 be available in OpenSolaris, including any changes which may
 have already been implemented.

Indeed. For example, less than a week ago fix for the following two CRs 
(along with some others) was put back into Solaris Nevada:

6333409 traversal code should be able to issue multiple reads in parallel
6418042 want traversal in depth-first pre-order for quicker 'zfs send'

This should have positive impact on 'zfs send' performance.

Wbr,
victor
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-20 Thread Scott Williamson
On Mon, Oct 20, 2008 at 1:52 AM, Victor Latushkin
[EMAIL PROTECTED]wrote

 Indeed. For example, less than a week ago fix for the following two CRs
 (along with some others) was put back into Solaris Nevada:

 6333409 traversal code should be able to issue multiple reads in parallel
 6418042 want traversal in depth-first pre-order for quicker 'zfs send'


That is helpful Victor. Does anyone have a full list of CRs that I can
provide to sun support? I have tried searching the bugs database, but I
didn't even find those two on my own.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-18 Thread Carsten Aulbert
Hi

Miles Nordin wrote:
 r == Ross  [EMAIL PROTECTED] writes:
 
  r figures so close to 10MB/s.  All three servers are running
  r full duplex gigabit though
 
 there is one tricky way 100Mbit/s could still bite you, but it's
 probably not happening to you.  It mostly affects home users with
 unmanaged switches:
 
   http://www.smallnetbuilder.com/content/view/30212/54/
   http://virtualthreads.blogspot.com/2006/02/beware-ethernet-flow-control.html
 
 because the big switch vendors all use pause frames safely:
 
  http://www.networkworld.com/netresources/0913flow2.html -- pause frames as 
 interpreted by netgear are harmful

That rings a bell, Ross, are you using NFS via UDP or TCP? May it be
that your network has different performance levels for different
transport types? For our network we have disabled pause frames completey
and rely only on TCP internal mechanisms to prevent flooding/blocking.

Carsten

PS: the job where 25k files sizing up to 800 GB is now done - zfs send
took only 52 hrs and the speed was ~ 4.5 MB/s :(
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-17 Thread Ross
Ok, just did some more testing on this machine to try to find where my 
bottlenecks are.  Something very odd is going on here.  As best I can tell 
there are two separate problems now:

- something is throttling network output to 10MB/s
- something is throttling zfs send to around 20MB/s

The network throughput I've verified with mbuffer:

1.  A quick mbuffer test from /dev/zero to /dev/null gave me 565MB/s.
2.  On a test server, mbuffer sending from /dev/zero on one machine to 
/dev/null on another gave me 37MB/s
3.  On the live server, mbuffer sending from /dev/zero to the same receiving 
machine gave me just under 10MB/s.

This looks very much like mbuffer is throttled on this machine, but I know NFS 
can give me 60-80MB/s.  Can anybody give me a clue as to what could be causing 
this?


And the disk performance is just as confusing.  Again I used a test server to 
provide a comparison, and this time used a zfs scrub with iostat to check the 
performance possible on the disks.

Live server:  5 sets of 3 way mirrors
Test server:  5 disk raid-z2

1.  On the Live server, zfs send to /dev/null via mbuffer reports a speed of 
21MB/s
 # zfs send [EMAIL PROTECTED] | mbuffer -s 128k -m 512M  /dev/null
2.  On the Test server, zfs send to /dev/null via mbuffer reports a speed of 
35MB/s
3.  On the Live server, zpool scrub and iostat report a peak of 3k iops, and 
283MB/s throughput.
4.  On the Test server, zpool scrub and iostat report a peak of 472 iops, and 
53MB/s throughput.

Surely the send and scrub operations should give similar results?  Why is zpool 
scrub running 10-15x faster than zfs send on the live server?

The iostat figures on the live server are particularly telling.

During a scrub (30s intervals):
   capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
rc-pool  734G  1.55T  2.94K 41   189M   788K
  mirror 144G   320G578  6  39.2M   166K
c1t1d0  -  -379  5  39.9M   166K
c1t2d0  -  -379  5  39.9M   166K
c2t1d0  -  -385  5  40.1M   166K
  mirror 147G   317G633  2  37.8M   170K
c1t3d0  -  -389  2  38.7M   171K
c2t2d0  -  -393  2  38.9M   171K
c2t0d0  -  -384  2  38.9M   171K
  mirror 147G   317G619  6  37.3M  57.5K
c2t3d0  -  -377  2  38.3M  57.9K
c1t5d0  -  -377  2  38.3M  57.9K
c1t4d0  -  -373  3  38.2M  57.9K
  mirror 148G   316G638 10  37.6M  64.0K
c2t4d0  -  -375  4  38.5M  64.4K
c2t5d0  -  -386  6  38.2M  64.4K
c1t6d0  -  -384  6  38.2M  64.4K
  mirror 149G   315G540  6  37.4M   164K
c1t7d0  -  -356  4  38.1M   164K
c2t6d0  -  -362  5  38.2M   164K
c2t7d0  -  -361  5  38.2M   164K
  c3d1p0  12K   504M  0  8  0   166K
--  -  -  -  -  -  -

During a send (30s intervals):
   capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
rc-pool  734G  1.55T148 55  18.6M  1.71M
  mirror 144G   320G 25  6  3.15M   235K
c1t1d0  -  -  8  3  1.02M   235K
c1t2d0  -  -  7  3   954K   235K
c2t1d0  -  -  9  3  1.19M   235K
  mirror 147G   317G 27  3  3.40M   203K
c1t3d0  -  -  8  2  1.03M   203K
c2t2d0  -  -  9  3  1.25M   203K
c2t0d0  -  -  8  2  1.11M   203K
  mirror 147G   317G 32  2  4.12M   205K
c2t3d0  -  - 11  1  1.45M   205K
c1t5d0  -  - 10  1  1.34M   205K
c1t4d0  -  - 10  1  1.34M   205K
  mirror 148G   316G 32  2  4.02M   201K
c2t4d0  -  - 10  1  1.37M   201K
c2t5d0  -  -  9  1  1.23M   201K
c1t6d0  -  - 11  1  1.43M   201K
  mirror 149G   315G 31  6  3.89M   180K
c1t7d0  -  - 11  2  1.45M   180K
c2t6d0  -  -  8  2  1.10M   180K
c2t7d0  -  - 10  2  1.35M   180K
  c3d1p0  12K   504M  0 34  0   727K
--  -  -  -  -  -  -

Can anybody explain why zfs send could be so slow on one server?  Is anybody 
else able to compare their iostat results for a zfs send and zpool scrub to see 
if they also have such a huge difference between the figures?

thanks,

Ross
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-17 Thread Dimitri Aivaliotis
Hi Ross,

On Fri, Oct 17, 2008 at 1:35 PM, Ross [EMAIL PROTECTED] wrote:
 Ok, just did some more testing on this machine to try to find where my 
 bottlenecks are.  Something very odd is going on here.  As best I can tell 
 there are two separate problems now:

 - something is throttling network output to 10MB/s


I'll try to help you with this problem.


 The network throughput I've verified with mbuffer:

 1.  A quick mbuffer test from /dev/zero to /dev/null gave me 565MB/s.
 2.  On a test server, mbuffer sending from /dev/zero on one machine to 
 /dev/null on another gave me 37MB/s
 3.  On the live server, mbuffer sending from /dev/zero to the same receiving 
 machine gave me just under 10MB/s.

 This looks very much like mbuffer is throttled on this machine, but I know 
 NFS can give me 60-80MB/s.  Can anybody give me a clue as to what could be 
 causing this?


Does your NFS mount go over a separate network?  If not, just ignore
this advice. :)

When first testing out ZFS over NFS performance, I ran into a similar
problem.  I had very nice graphs, all plateauing at 10MB/s, and was
getting frustrated at performance being so slow.  It turned out that
one of my links was 100Mbit.  I took a moment to breathe, learn from
my mistake (check the network links BEFORE running performance tests),
and ran my tests again.

Check your network links, make sure that it's Gigabit all the way
through, and that you're negotiating full-duplex.  A 100Mbit link will
give you just about 10MB/s throughput on network transfers.

- Dimitri
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-17 Thread Ross
Yup, that's one of the first things I checked when it came out with
figures so close to 10MB/s.  All three servers are running full duplex
gigabit though, as reported by both Solaris and the switch.  And both
the NFS at 60+MB/s, and the zfs send / receive are all going over the
same network link, in some cases to the same servers.
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-17 Thread Scott Williamson
Hi All,

I have opened a ticket with sun support #66104157 regarding zfs send /
receive and will let you know what I find out.

Keep in mind that this is for Solaris 10 not opensolaris.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-17 Thread Miles Nordin
 r == Ross  [EMAIL PROTECTED] writes:

 r figures so close to 10MB/s.  All three servers are running
 r full duplex gigabit though

there is one tricky way 100Mbit/s could still bite you, but it's
probably not happening to you.  It mostly affects home users with
unmanaged switches:

  http://www.smallnetbuilder.com/content/view/30212/54/
  http://virtualthreads.blogspot.com/2006/02/beware-ethernet-flow-control.html

because the big switch vendors all use pause frames safely:

 http://www.networkworld.com/netresources/0913flow2.html -- pause frames as 
interpreted by netgear are harmful



pgpUnfS5B76nY.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-17 Thread Richard Elling
Scott Williamson wrote:
 Hi All,
  
 I have opened a ticket with sun support #66104157 regarding zfs send / 
 receive and will let you know what I find out.

Thanks.

  
 Keep in mind that this is for Solaris 10 not opensolaris.

Keep in mind that any changes required for Solaris 10 will first
be available in OpenSolaris, including any changes which may
have already been implemented.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-17 Thread Scott Williamson
On Fri, Oct 17, 2008 at 2:48 PM, Richard Elling [EMAIL PROTECTED]wrote:

 Keep in mind that any changes required for Solaris 10 will first
 be available in OpenSolaris, including any changes which may
 have already been implemented.


For me (who uses SOL10) it is the only way I can get information about what
bugs and changes have been identified and helps me get stuff from
opensolaris into sol10. The last support ticket resulted in a solaris iSCSI
target to windows initiator patch to solaris 10 that made iSCSI targets on
ZFS actually work for us.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-16 Thread Ross
Ok, I'm not entirely sure this is the same problem, but it does sound fairly 
similar.  Apologies for hijacking the thread if this does turn out to be 
something else.

After following the advice here to get mbuffer working with zfs send / receive, 
I found I was only getting around 10MB/s throughput.  Thinking it was a network 
problem I started the below thread in the OpenSolaris help forum:
http://www.opensolaris.org/jive/thread.jspa?messageID=294846

Now though I don't think it's network at all.  The end result from that thread 
is that we can't see any errors in the network setup, and using nicstat and NFS 
I can show that the server is capable of 50-60MB/s over the gigabit link.  
Nicstat also shows clearly that both zfs send / receive and mbuffer are only 
sending 1/5 of that amount of data over the network.

I've completely run out of ideas of my own (but I do half expect there's a 
simple explanation I haven't thought of).  Can anybody think of a reason why 
both zfs send / receive and mbuffer would be so slow?
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-16 Thread Ross Smith

 Try to separate the two things:  (1) Try /dev/zero - mbuffer --- network 
 --- mbuffer  /dev/null
 That should give you wirespeed
I tried that already.  It still gets just 10-11MB/s from this server.
I can get zfs send / receive and mbuffer working at 30MB/s though from a couple 
of test servers (with much lower specs).
 
 (2) Try zfs send | mbuffer  /dev/null That should give you an idea how fast 
 zfs send really is locally.
Hmm, that's better than 10MB/s, but the average is still only around 20MB/s:
summary:  942 MByte in 47.4 sec - average of 19.9 MB/s
 
I think that points to another problem though as the send mbuffer is 100% full. 
 Certainly the pool itself doesn't appear under any strain at all while this is 
going on:
 
   capacity operationsbandwidthpool used  avail   
read  write   read  write--  -  -  -  -  -  
-rc-pool  732G  1.55T171 85  21.3M  1.01M  mirror 144G   
320G 38  0  4.78M  0c1t1d0  -  -  6  0   779K   
   0c1t2d0  -  - 17  0  2.17M  0c2t1d0  -  
- 14  0  1.85M  0  mirror 146G   318G 39  0  4.89M  
0c1t3d0  -  - 20  0  2.50M  0c2t2d0  -  -   
  13  0  1.63M  0c2t0d0  -  -  6  0   779K  0  
mirror 146G   318G 34  0  4.35M  0c2t3d0  -  - 
19  0  2.39M  0c1t5d0  -  -  7  0  1002K  0
c1t4d0  -  -  7  0  1002K  0  mirror 148G   316G 23 
 0  2.93M  0c2t4d0  -  -  8  0  1.09M  0
c2t5d0  -  -  6  0   890K  0c1t6d0  -  -  7 
 0  1002K  0  mirror 148G   316G 35  0  4.35M  0
c1t7d0  -  -  6  0   779K  0c2t6d0  -  - 12 
 0  1.52M  0c2t7d0  -  - 17  0  2.07M  0  
c3d1p0  12K   504M  0 85  0  1.01M--  -  -  
-  -  -  -
Especially when compared to the zfs send stats on my backup server which 
managed 30MB/s via mbuffer (Being received on a single virtual SATA disk):
   capacity operationsbandwidthpool used  avail   
read  write   read  write--  -  -  -  -  -  
-rpool   5.12G  42.6G  0  5  0  27.1K  c4t0d0s0  5.12G  
42.6G  0  5  0  27.1K--  -  -  -  -  -  
-zfspool  431G  4.11T261  0  31.4M  0  raidz2 431G  
4.11T261  0  31.4M  0c4t1d0  -  -155  0  6.28M  
0c4t2d0  -  -155  0  6.27M  0c4t3d0  -  
-155  0  6.27M  0c4t4d0  -  -155  0  6.27M  
0c4t5d0  -  -155  0  6.27M  0--  -  -  
-  -  -  -
The really ironic thing is that the 30MB/s send / receive was sending to a 
virtual SATA disk which is stored (via sync NFS) on the server I'm having 
problems with...
 
Ross
 
 
_
Win New York holidays with Kellogg’s  Live Search
http://clk.atdmt.com/UKM/go/111354033/direct/01/___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-16 Thread Ross Smith


Oh dear god.  Sorry folks, it looks like the new hotmail really doesn't play 
well with the list.  Trying again in plain text:
 
 
 Try to separate the two things:
 
 (1) Try /dev/zero - mbuffer --- network --- mbuffer /dev/null
 That should give you wirespeed
 
I tried that already.  It still gets just 10-11MB/s from this server.
I can get zfs send / receive and mbuffer working at 30MB/s though from a couple 
of test servers (with much lower specs).
 
 (2) Try zfs send | mbuffer /dev/null
 That should give you an idea how fast zfs send really is locally.
 
Hmm, that's better than 10MB/s, but the average is still only around 20MB/s:
summary:  942 MByte in 47.4 sec - average of 19.9 MB/s
 
I think that points to another problem though as the send mbuffer is 100% full. 
 Certainly the pool itself doesn't appear under any strain at all while this is 
going on:
 
   capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
rc-pool  732G  1.55T171 85  21.3M  1.01M
  mirror 144G   320G 38  0  4.78M  0
c1t1d0  -  -  6  0   779K  0
c1t2d0  -  - 17  0  2.17M  0
c2t1d0  -  - 14  0  1.85M  0
  mirror 146G   318G 39  0  4.89M  0
c1t3d0  -  - 20  0  2.50M  0
c2t2d0  -  - 13  0  1.63M  0
c2t0d0  -  -  6  0   779K  0
  mirror 146G   318G 34  0  4.35M  0
c2t3d0  -  - 19  0  2.39M  0
c1t5d0  -  -  7  0  1002K  0
c1t4d0  -  -  7  0  1002K  0
  mirror 148G   316G 23  0  2.93M  0
c2t4d0  -  -  8  0  1.09M  0
c2t5d0  -  -  6  0   890K  0
c1t6d0  -  -  7  0  1002K  0
  mirror 148G   316G 35  0  4.35M  0
c1t7d0  -  -  6  0   779K  0
c2t6d0  -  - 12  0  1.52M  0
c2t7d0  -  - 17  0  2.07M  0
  c3d1p0  12K   504M  0 85  0  1.01M
--  -  -  -  -  -  -
 
Especially when compared to the zfs send stats on my backup server which 
managed 30MB/s via mbuffer (Being received on a single virtual SATA disk):
   capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
rpool   5.12G  42.6G  0  5  0  27.1K
  c4t0d0s0  5.12G  42.6G  0  5  0  27.1K
--  -  -  -  -  -  -
zfspool  431G  4.11T261  0  31.4M  0
  raidz2 431G  4.11T261  0  31.4M  0
c4t1d0  -  -155  0  6.28M  0
c4t2d0  -  -155  0  6.27M  0
c4t3d0  -  -155  0  6.27M  0
c4t4d0  -  -155  0  6.27M  0
c4t5d0  -  -155  0  6.27M  0
--  -  -  -  -  -  -
The really ironic thing is that the 30MB/s send / receive was sending to a 
virtual SATA disk which is stored (via sync NFS) on the server I'm having 
problems with...
 
Ross

 

 Date: Thu, 16 Oct 2008 14:27:49 +0200
 From: [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 CC: zfs-discuss@opensolaris.org
 Subject: Re: [zfs-discuss] Improving zfs send performance
 
 Hi Ross
 
 Ross wrote:
 Now though I don't think it's network at all. The end result from that 
 thread is that we can't see any errors in the network setup, and using 
 nicstat and NFS I can show that the server is capable of 50-60MB/s over the 
 gigabit link. Nicstat also shows clearly that both zfs send / receive and 
 mbuffer are only sending 1/5 of that amount of data over the network.
 
 I've completely run out of ideas of my own (but I do half expect there's a 
 simple explanation I haven't thought of). Can anybody think of a reason why 
 both zfs send / receive and mbuffer would be so slow?
 
 Try to separate the two things:
 
 (1) Try /dev/zero - mbuffer --- network --- mbuffer /dev/null
 
 That should give you wirespeed
 
 (2) Try zfs send | mbuffer /dev/null
 
 That should give you an idea how fast zfs send really is locally.
 
 Carsten
_
Get all your favourite content with the slick new MSN Toolbar - FREE
http://clk.atdmt.com/UKM/go/111354027/direct/01/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-16 Thread Carsten Aulbert
Hi Scott,

Scott Williamson wrote:
 You seem to be using dd for write testing. In my testing I noted that
 there was a large difference in write speed between using dd to write
 from /dev/zero and using other files. Writing from /dev/zero always
 seemed to be fast, reaching the maximum of ~200MB/s and using cp which
 would perform poorler the fewer the vdevs.

You are right, the write benchmarks were done with dd just to have some
bulk bulk figures since usually zeros can be generated fast enough.

 
 This also impacted the zfs send speed, as with fewer vdevs in RaidZ2 the
 disks seemed to spend most of their time seeking during the send.
 

That seems a bit too simplistic to me. If you compare raidz with raidz2
it seems that raidz2 is not too bad with fewer vdevs. I wish there was a
way for zfs send to avoid so many seeks. The  1 TB file system is
still being zfs send, now close to 48 hours.

Cheers

Carsten

PS: We still have a spare thumper sitting around, maybe I give it a try
with 5 vdevs
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-16 Thread Carsten Aulbert
Hi Ross

Ross wrote:
 Now though I don't think it's network at all.  The end result from that 
 thread is that we can't see any errors in the network setup, and using 
 nicstat and NFS I can show that the server is capable of 50-60MB/s over the 
 gigabit link.  Nicstat also shows clearly that both zfs send / receive and 
 mbuffer are only sending 1/5 of that amount of data over the network.
 
 I've completely run out of ideas of my own (but I do half expect there's a 
 simple explanation I haven't thought of).  Can anybody think of a reason why 
 both zfs send / receive and mbuffer would be so slow?

Try to separate the two things:

(1) Try /dev/zero - mbuffer --- network --- mbuffer  /dev/null

That should give you wirespeed

(2) Try zfs send | mbuffer  /dev/null

That should give you an idea how fast zfs send really is locally.

Carsten
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-16 Thread Scott Williamson
Hi Carsten,

You seem to be using dd for write testing. In my testing I noted that there
was a large difference in write speed between using dd to write from
/dev/zero and using other files. Writing from /dev/zero always seemed to be
fast, reaching the maximum of ~200MB/s and using cp which would perform
poorler the fewer the vdevs.

This also impacted the zfs send speed, as with fewer vdevs in RaidZ2 the
disks seemed to spend most of their time seeking during the send.

On Thu, Oct 16, 2008 at 1:27 AM, Carsten Aulbert [EMAIL PROTECTED]
 wrote:

 Some time ago I made some tests to find this:

 (1) create a new zpool
 (2) Copy user's home to it (always the same ~ 25 GB IIRC)
 (3) zfs send to /dev/null
 (4) evaluate  continue loop

 I did this for fully mirrored setups, raidz as well as raidz2, the
 results were mixed:


 https://n0.aei.uni-hannover.de/cgi-bin/twiki/view/ATLAS/ZFSBenchmarkTest#ZFS_send_performance_relevant_fo

 The culprit here might be that in retrospect this seemed like a good
 home filesystem, i.e. one which was quite fast.

 If you don't want to bother with the table:

 Mirrored setup never exceeded 58 MB/s and was getting faster the more
 small mirrors you used.

 RaidZ had its sweetspot with a configuration of '6 6 6 6 6 6 5 5', i.e.
 6 or 5 disks per RaidZ and 8 vdevs

 RaidZ2 finally was best at '10 9 9 9 9', i.e. 5 vdevs but not much worse
 with only 3, i.e. what we are currently using to get more storage space
 (gains us about 2 TB/box).

 Cheers

 Carsten
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-15 Thread Ross Smith

I'm using 2008-05-07 (latest stable), am I right in assuming that one is ok?


 Date: Wed, 15 Oct 2008 13:52:42 +0200
 From: [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]; zfs-discuss@opensolaris.org
 Subject: Re: [zfs-discuss] Improving zfs send performance
 
 Thomas Maier-Komor schrieb:
 BTW: I release a new version of mbuffer today.
 
 WARNING!!!
 
 Sorry people!!!
 
 The latest version of mbuffer has a regression that can CORRUPT output
 if stdout is used. Please fall back to the last version. A fix is on the
 way...
 
 - Thomas

_
Discover Bird's Eye View now with Multimap from Live Search
http://clk.atdmt.com/UKM/go/111354026/direct/01/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-15 Thread Ross
Hi,

I'm just doing my first proper send/receive over the network and I'm getting 
just 9.4MB/s over a gigabit link.  Would you be able to provide an example of 
how to use mbuffer / socat with ZFS for a Solaris beginner?

thanks,

Ross
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-15 Thread Carsten Aulbert
Hi all,

Carsten Aulbert wrote:
 More later.

OK, I'm completely puzzled right now (and sorry for this lengthy email).
 My first (and currently only idea) was that the size of the files is
related to this effect, but that does not seem to be the case:

(1) A 185 GB zfs file system was transferred yesterday with a speed of
about 60 MB/s to two different servers. The histogram of files looks like:

2822 files were investigated, total size is: 185.82 Gbyte

Summary of file sizes [bytes]:
zero:  2
1 - 2 0
2 - 4 1
4 - 8 3
8 - 16   26
16 - 32   8
32 - 64   6
64 - 128 29
128 - 25611
256 - 51213
512 - 1024   17
1024 - 2k33
2k - 4k  45
4k - 8k9044  
8k - 16k 60
16k - 32k41
32k - 64k19
64k - 128k   22
128k - 256k  12
256k - 512k   5
512k - 1024k   1218  **
1024k - 2M16004  *
2M - 4M   46202

4M - 8M   0
8M - 16M  0
16M - 32M 0
32M - 64M 0
64M - 128M0
128M - 256M   0
256M - 512M   0
512M - 1024M  0
1024M - 2G0
2G - 4G   0
4G - 8G   0
8G - 16G  1

(2) Currently a much larger file system is being transferred, the same
script (even the same incarnation, i.e. process) is now running close to
22 hours:

28549 files were investigated, total size is: 646.67 Gbyte

Summary of file sizes [bytes]:
zero:   4954  **
1 - 2 0
2 - 4 0
4 - 8 1
8 - 161
16 - 32   0
32 - 64   0
64 - 128  1
128 - 256 0
256 - 512 9
512 - 1024   71
1024 - 2k 1
2k - 4k1095  **
4k - 8k8449  *
8k - 16k   2217  
16k - 32k   503  ***
32k - 64k 1
64k - 128k1
128k - 256k   1
256k - 512k   0
512k - 1024k  0
1024k - 2M0
2M - 4M   0
4M - 8M  16
8M - 16M  0
16M - 32M 0
32M - 64M 11218

64M - 128M0
128M - 256M   0
256M - 512M   0
512M - 1024M  0
1024M - 2G0
2G - 4G   5
4G - 8G   1
8G - 16G  3
16G - 32G 1


When watching zpool iostat I get this (30 second average, NOT the first
output):

   capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
atlashome   3.54T  17.3T137  0  4.28M  0
  raidz2 833G  6.00T  1  0  30.8K  0
c0t0d0  -  -  1  0  2.38K  0
c1t0d0  -  -  1  0  2.18K  0
c4t0d0  -  -  0  0  1.91K  0
c6t0d0  -  -  0  0  1.76K  0
c7t0d0  -  -  0  0  1.77K  0
c0t1d0  -  -  0  0  1.79K  0
c1t1d0  -  -  0  0  1.86K  0
c4t1d0  -  -  0  0  1.97K  0
c5t1d0  -  -  0  0  2.04K  0
c6t1d0  -  -  1  0  2.25K  0
c7t1d0  -  -  1  0  2.31K  0
c0t2d0  -  -  1  0  2.21K  0
c1t2d0  -  -  0  0  1.99K  0
c4t2d0  -  -  0  0  1.99K  0
c5t2d0  -  -  1  0  2.38K  0
  raidz21.29T  5.52T 67  0  2.09M  0
c6t2d0  -  - 58  0   143K  0
c7t2d0  -  - 58  0   141K  0
c0t3d0  -  - 53  0   131K  0
c1t3d0  -  - 53  0   130K  0
c4t3d0  -  - 58  0   143K  0
c5t3d0  -  - 58  0   145K  0
c6t3d0  -  - 59  0   147K  0
c7t3d0  -  - 59  0   146K  0
c0t4d0  -  - 59  0   145K  0
c1t4d0  -  - 58  0   145K  0
c4t4d0  -  - 58  0   145K  0
c6t4d0  -  - 58  0   143K  0
c7t4d0  -  - 58  0   143K  0
c0t5d0  -  - 58  0   145K  0
c1t5d0  -  - 58  0   144K  0
  raidz21.43T  5.82T 69  0  2.16M  0
c4t5d0  -  - 62  0   141K  0
c5t5d0  -  - 60  0   138K  0
c6t5d0  -  - 59  0   135K  0
c7t5d0  -  - 60  0   138K  0
c0t6d0  -  - 62

Re: [zfs-discuss] Improving zfs send performance

2008-10-15 Thread Thomas Maier-Komor
Thomas Maier-Komor schrieb:
 BTW: I release a new version of mbuffer today.

WARNING!!!

Sorry people!!!

The latest version of mbuffer has a regression that can CORRUPT output
if stdout is used. Please fall back to the last version. A fix is on the
way...

- Thomas
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-15 Thread Thomas Maier-Komor
Ross Smith schrieb:
 I'm using 2008-05-07 (latest stable), am I right in assuming that one is ok?
 
 
 Date: Wed, 15 Oct 2008 13:52:42 +0200
 From: [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]; zfs-discuss@opensolaris.org
 Subject: Re: [zfs-discuss] Improving zfs send performance

 Thomas Maier-Komor schrieb:
 BTW: I release a new version of mbuffer today.
 WARNING!!!

 Sorry people!!!

 The latest version of mbuffer has a regression that can CORRUPT output
 if stdout is used. Please fall back to the last version. A fix is on the
 way...

 - Thomas
 
 _
 Discover Bird's Eye View now with Multimap from Live Search
 http://clk.atdmt.com/UKM/go/111354026/direct/01/

Yes this one is OK. The regression appeared in 20081014.

- Thomas

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-15 Thread Ross Smith

Thanks, that got it working.  I'm still only getting 10MB/s, so it's not solved 
my problem - I've still got a bottleneck somewhere, but mbuffer is a huge 
improvement over standard zfs send / receive.  It makes such a difference when 
you can actually see what's going on.



 Date: Wed, 15 Oct 2008 12:08:14 +0200
 From: [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]; zfs-discuss@opensolaris.org
 Subject: Re: [zfs-discuss] Improving zfs send performance
 
 Ross schrieb:
 Hi,
 
 I'm just doing my first proper send/receive over the network and I'm getting 
 just 9.4MB/s over a gigabit link.  Would you be able to provide an example 
 of how to use mbuffer / socat with ZFS for a Solaris beginner?
 
 thanks,
 
 Ross
 --
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 
 receiver mbuffer -I sender:1 -s 128k -m 512M | zfs receive
 
 sender zfs send mypool/[EMAIL PROTECTED] | mbuffer -s 128k -m
 512M -O receiver:1
 
 BTW: I release a new version of mbuffer today.
 
 HTH,
 Thomas

_
Make a mini you and download it into Windows Live Messenger
http://clk.atdmt.com/UKM/go/111354029/direct/01/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-15 Thread Carsten Aulbert
Hi Ross

Ross Smith wrote:
 Thanks, that got it working.  I'm still only getting 10MB/s, so it's not 
 solved my problem - I've still got a bottleneck somewhere, but mbuffer is a 
 huge improvement over standard zfs send / receive.  It makes such a 
 difference when you can actually see what's going on.

I'm currently trying to investigate this a bit. One of our user's home
directories is extremely slow to 'zfs send'. It started yesterday
afternoon at about 1600+0200 and is still running and has only copied
less than 50% of the whole tree:

On the receiving side zfs get tells me:

atlashome/BACKUP/XXX  used   193G   -
atlashome/BACKUP/XXX  available  17.2T  -
atlashome/BACKUP/XXX  referenced 193G   -
atlashome/BACKUP/XXX  compressratio  1.81x  -

So close 350 GB are transferred and about 500 GB to go.

More later.

Carsten
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-15 Thread Marcelo Leal
Hello all,
 I think in SS 11 should be -xarch=amd64.

 Leal.
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-15 Thread Thomas Maier-Komor
Ross schrieb:
 Hi,
 
 I'm just doing my first proper send/receive over the network and I'm getting 
 just 9.4MB/s over a gigabit link.  Would you be able to provide an example of 
 how to use mbuffer / socat with ZFS for a Solaris beginner?
 
 thanks,
 
 Ross
 --
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

receiver mbuffer -I sender:1 -s 128k -m 512M | zfs receive

sender zfs send mypool/[EMAIL PROTECTED] | mbuffer -s 128k -m
512M -O receiver:1

BTW: I release a new version of mbuffer today.

HTH,
Thomas
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-15 Thread Richard Elling
comments below...

Carsten Aulbert wrote:
 Hi all,

 Carsten Aulbert wrote:
   
 More later.
 

 OK, I'm completely puzzled right now (and sorry for this lengthy email).
  My first (and currently only idea) was that the size of the files is
 related to this effect, but that does not seem to be the case:

 (1) A 185 GB zfs file system was transferred yesterday with a speed of
 about 60 MB/s to two different servers. The histogram of files looks like:

 2822 files were investigated, total size is: 185.82 Gbyte

 Summary of file sizes [bytes]:
 zero:  2
 1 - 2 0
 2 - 4 1
 4 - 8 3
 8 - 16   26
 16 - 32   8
 32 - 64   6
 64 - 128 29
 128 - 25611
 256 - 51213
 512 - 1024   17
 1024 - 2k33
 2k - 4k  45
 4k - 8k9044  
 8k - 16k 60
 16k - 32k41
 32k - 64k19
 64k - 128k   22
 128k - 256k  12
 256k - 512k   5
 512k - 1024k   1218  **
 1024k - 2M16004  *
 2M - 4M   46202
 
 4M - 8M   0
 8M - 16M  0
 16M - 32M 0
 32M - 64M 0
 64M - 128M0
 128M - 256M   0
 256M - 512M   0
 512M - 1024M  0
 1024M - 2G0
 2G - 4G   0
 4G - 8G   0
 8G - 16G  1

 (2) Currently a much larger file system is being transferred, the same
 script (even the same incarnation, i.e. process) is now running close to
 22 hours:

 28549 files were investigated, total size is: 646.67 Gbyte

 Summary of file sizes [bytes]:
 zero:   4954  **
 1 - 2 0
 2 - 4 0
 4 - 8 1
 8 - 161
 16 - 32   0
 32 - 64   0
 64 - 128  1
 128 - 256 0
 256 - 512 9
 512 - 1024   71
 1024 - 2k 1
 2k - 4k1095  **
 4k - 8k8449  *
 8k - 16k   2217  
 16k - 32k   503  ***
 32k - 64k 1
 64k - 128k1
 128k - 256k   1
 256k - 512k   0
 512k - 1024k  0
 1024k - 2M0
 2M - 4M   0
 4M - 8M  16
 8M - 16M  0
 16M - 32M 0
 32M - 64M 11218
 
 64M - 128M0
 128M - 256M   0
 256M - 512M   0
 512M - 1024M  0
 1024M - 2G0
 2G - 4G   5
 4G - 8G   1
 8G - 16G  3
 16G - 32G 1


 When watching zpool iostat I get this (30 second average, NOT the first
 output):

capacity operationsbandwidth
 pool used  avail   read  write   read  write
 --  -  -  -  -  -  -
 atlashome   3.54T  17.3T137  0  4.28M  0
   raidz2 833G  6.00T  1  0  30.8K  0
 c0t0d0  -  -  1  0  2.38K  0
 c1t0d0  -  -  1  0  2.18K  0
 c4t0d0  -  -  0  0  1.91K  0
 c6t0d0  -  -  0  0  1.76K  0
 c7t0d0  -  -  0  0  1.77K  0
 c0t1d0  -  -  0  0  1.79K  0
 c1t1d0  -  -  0  0  1.86K  0
 c4t1d0  -  -  0  0  1.97K  0
 c5t1d0  -  -  0  0  2.04K  0
 c6t1d0  -  -  1  0  2.25K  0
 c7t1d0  -  -  1  0  2.31K  0
 c0t2d0  -  -  1  0  2.21K  0
 c1t2d0  -  -  0  0  1.99K  0
 c4t2d0  -  -  0  0  1.99K  0
 c5t2d0  -  -  1  0  2.38K  0
   raidz21.29T  5.52T 67  0  2.09M  0
 c6t2d0  -  - 58  0   143K  0
 c7t2d0  -  - 58  0   141K  0
 c0t3d0  -  - 53  0   131K  0
 c1t3d0  -  - 53  0   130K  0
 c4t3d0  -  - 58  0   143K  0
 c5t3d0  -  - 58  0   145K  0
 c6t3d0  -  - 59  0   147K  0
 c7t3d0  -  - 59  0   146K  0
 c0t4d0  -  - 59  0   145K  0
 c1t4d0  -  - 58  0   145K  0
 c4t4d0  -  - 58  0   145K  0
 c6t4d0  -  - 58  0   143K  0
 c7t4d0  -  - 58  0   143K  0
 c0t5d0  -  - 58  0   145K  0
 c1t5d0  -  - 58  0   144K  0
   raidz21.43T  5.82T 69  0  2.16M  0
 c4t5d0  -  - 62  0   141K  0
 c5t5d0

Re: [zfs-discuss] Improving zfs send performance

2008-10-15 Thread Carsten Aulbert
Hi Richard,

Richard Elling wrote:
 Since you are reading, it depends on where the data was written.
 Remember, ZFS dynamic striping != RAID-0.
 I would expect something like this if the pool was expanded at some
 point in time.

No, the RAID was set-up in one go right after jumpstarting the box.

 (2) The disks should be able to perform much much faster than they
 currently output data at, I believe it;s 2008 and not 1995.
   
 
 X4500?  Those disks are good for about 75-80 random iops,
 which seems to be about what they are delivering.  The dtrace
 tool, iopattern, will show the random/sequential nature of the
 workload.


I need to read about his a bit and will try to analyze it.

 (3) The four cores of the X4500 are dying of boredom, i.e. idle 95% all
 the time.

 Has anyone a good idea, where the bottleneck could be? I'm running out
 of ideas.
   
 
 I would suspect the disks.  30 second samples are not very useful
 to try and debug such things -- even 1 second samples can be
 too coarse.  But you should take a look at 1 second samples
 to see if there is a consistent I/O workload.
 -- richard
 

Without doing too much statistics (yet, if needed I can easily do that)
it looks like these:


   capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
atlashome   3.54T  17.3T256  0  7.97M  0
  raidz2 833G  6.00T  0  0  0  0
c0t0d0  -  -  0  0  0  0
c1t0d0  -  -  0  0  0  0
c4t0d0  -  -  0  0  0  0
c6t0d0  -  -  0  0  0  0
c7t0d0  -  -  0  0  0  0
c0t1d0  -  -  0  0  0  0
c1t1d0  -  -  0  0  0  0
c4t1d0  -  -  0  0  0  0
c5t1d0  -  -  0  0  0  0
c6t1d0  -  -  0  0  0  0
c7t1d0  -  -  0  0  0  0
c0t2d0  -  -  0  0  0  0
c1t2d0  -  -  0  0  0  0
c4t2d0  -  -  0  0  0  0
c5t2d0  -  -  0  0  0  0
  raidz21.29T  5.52T133  0  4.14M  0
c6t2d0  -  -117  0   285K  0
c7t2d0  -  -114  0   279K  0
c0t3d0  -  -106  0   261K  0
c1t3d0  -  -114  0   282K  0
c4t3d0  -  -118  0   294K  0
c5t3d0  -  -125  0   308K  0
c6t3d0  -  -126  0   311K  0
c7t3d0  -  -118  0   293K  0
c0t4d0  -  -119  0   295K  0
c1t4d0  -  -120  0   298K  0
c4t4d0  -  -120  0   291K  0
c6t4d0  -  -106  0   257K  0
c7t4d0  -  - 96  0   236K  0
c0t5d0  -  -109  0   267K  0
c1t5d0  -  -114  0   282K  0
  raidz21.43T  5.82T123  0  3.83M  0
c4t5d0  -  -108  0   242K  0
c5t5d0  -  -104  0   236K  0
c6t5d0  -  -104  0   239K  0
c7t5d0  -  -107  0   245K  0
c0t6d0  -  -108  0   248K  0
c1t6d0  -  -106  0   245K  0
c4t6d0  -  -108  0   250K  0
c5t6d0  -  -112  0   258K  0
c6t6d0  -  -114  0   261K  0
c7t6d0  -  -110  0   253K  0
c0t7d0  -  -109  0   248K  0
c1t7d0  -  -109  0   246K  0
c4t7d0  -  -108  0   243K  0
c5t7d0  -  -108  0   244K  0
c6t7d0  -  -106  0   240K  0
c7t7d0  -  -109  0   244K  0
--  -  -  -  -  -  -

the iops vary between about 70 - 140, interesting bit is that the first
raidz2 does not get any hits at all :(

Cheers

Carsten
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-15 Thread Scott Williamson
Hi All,

Just want to note that I had the same issue with zfs send + vdevs that had
11 drives in them on a X4500. Reducing the count of drives per zvol cleared
this up.

One vdev is IOPS limited to the speed of one drive in that vdev, according
to this post http://opensolaris.org/jive/thread.jspa?threadID=74033 (see
comment from ptribble.)

On Wed, Oct 15, 2008 at 3:07 PM, Carsten Aulbert [EMAIL PROTECTED]
 wrote:

 Hi Richard,

 Richard Elling wrote:
  Since you are reading, it depends on where the data was written.
  Remember, ZFS dynamic striping != RAID-0.
  I would expect something like this if the pool was expanded at some
  point in time.

 No, the RAID was set-up in one go right after jumpstarting the box.

  (2) The disks should be able to perform much much faster than they
  currently output data at, I believe it;s 2008 and not 1995.
 
 
  X4500?  Those disks are good for about 75-80 random iops,
  which seems to be about what they are delivering.  The dtrace
  tool, iopattern, will show the random/sequential nature of the
  workload.
 

 I need to read about his a bit and will try to analyze it.

  (3) The four cores of the X4500 are dying of boredom, i.e. idle 95% all
  the time.
 
  Has anyone a good idea, where the bottleneck could be? I'm running out
  of ideas.
 
 
  I would suspect the disks.  30 second samples are not very useful
  to try and debug such things -- even 1 second samples can be
  too coarse.  But you should take a look at 1 second samples
  to see if there is a consistent I/O workload.
  -- richard
 

 Without doing too much statistics (yet, if needed I can easily do that)
 it looks like these:


   capacity operationsbandwidth
 pool used  avail   read  write   read  write
 --  -  -  -  -  -  -
 atlashome   3.54T  17.3T256  0  7.97M  0
  raidz2 833G  6.00T  0  0  0  0
c0t0d0  -  -  0  0  0  0
c1t0d0  -  -  0  0  0  0
c4t0d0  -  -  0  0  0  0
c6t0d0  -  -  0  0  0  0
c7t0d0  -  -  0  0  0  0
c0t1d0  -  -  0  0  0  0
c1t1d0  -  -  0  0  0  0
c4t1d0  -  -  0  0  0  0
c5t1d0  -  -  0  0  0  0
c6t1d0  -  -  0  0  0  0
c7t1d0  -  -  0  0  0  0
c0t2d0  -  -  0  0  0  0
c1t2d0  -  -  0  0  0  0
c4t2d0  -  -  0  0  0  0
c5t2d0  -  -  0  0  0  0
  raidz21.29T  5.52T133  0  4.14M  0
c6t2d0  -  -117  0   285K  0
c7t2d0  -  -114  0   279K  0
c0t3d0  -  -106  0   261K  0
c1t3d0  -  -114  0   282K  0
c4t3d0  -  -118  0   294K  0
c5t3d0  -  -125  0   308K  0
c6t3d0  -  -126  0   311K  0
c7t3d0  -  -118  0   293K  0
c0t4d0  -  -119  0   295K  0
c1t4d0  -  -120  0   298K  0
c4t4d0  -  -120  0   291K  0
c6t4d0  -  -106  0   257K  0
c7t4d0  -  - 96  0   236K  0
c0t5d0  -  -109  0   267K  0
c1t5d0  -  -114  0   282K  0
  raidz21.43T  5.82T123  0  3.83M  0
c4t5d0  -  -108  0   242K  0
c5t5d0  -  -104  0   236K  0
c6t5d0  -  -104  0   239K  0
c7t5d0  -  -107  0   245K  0
c0t6d0  -  -108  0   248K  0
c1t6d0  -  -106  0   245K  0
c4t6d0  -  -108  0   250K  0
c5t6d0  -  -112  0   258K  0
c6t6d0  -  -114  0   261K  0
c7t6d0  -  -110  0   253K  0
c0t7d0  -  -109  0   248K  0
c1t7d0  -  -109  0   246K  0
c4t7d0  -  -108  0   243K  0
c5t7d0  -  -108  0   244K  0
c6t7d0  -  -106  0   240K  0
c7t7d0  -  -109  0   244K  0
 --  -  -  -  -  -  -

 the iops vary between about 70 - 140, interesting bit is that the first
 raidz2 does not get any hits at all :(

 Cheers

 Carsten
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-15 Thread Brent Jones
On Wed, Oct 15, 2008 at 2:17 PM, Scott Williamson
[EMAIL PROTECTED] wrote:
 Hi All,

 Just want to note that I had the same issue with zfs send + vdevs that had
 11 drives in them on a X4500. Reducing the count of drives per zvol cleared
 this up.

 One vdev is IOPS limited to the speed of one drive in that vdev, according
 to this post (see comment from ptribble.)


Scott,

Can you tell us the configuration that you're using that is working for you?
Were you using RaidZ, or RaidZ2? I'm wondering what the sweetspot is
to get a good compromise in vdevs and usable space/performance

Thanks!

-- 
Brent Jones
[EMAIL PROTECTED]
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-15 Thread Carsten Aulbert
Hi again

Brent Jones wrote:
 
 Scott,
 
 Can you tell us the configuration that you're using that is working for you?
 Were you using RaidZ, or RaidZ2? I'm wondering what the sweetspot is
 to get a good compromise in vdevs and usable space/performance


Some time ago I made some tests to find this:

(1) create a new zpool
(2) Copy user's home to it (always the same ~ 25 GB IIRC)
(3) zfs send to /dev/null
(4) evaluate  continue loop

I did this for fully mirrored setups, raidz as well as raidz2, the
results were mixed:

https://n0.aei.uni-hannover.de/cgi-bin/twiki/view/ATLAS/ZFSBenchmarkTest#ZFS_send_performance_relevant_fo

The culprit here might be that in retrospect this seemed like a good
home filesystem, i.e. one which was quite fast.

If you don't want to bother with the table:

Mirrored setup never exceeded 58 MB/s and was getting faster the more
small mirrors you used.

RaidZ had its sweetspot with a configuration of '6 6 6 6 6 6 5 5', i.e.
6 or 5 disks per RaidZ and 8 vdevs

RaidZ2 finally was best at '10 9 9 9 9', i.e. 5 vdevs but not much worse
with only 3, i.e. what we are currently using to get more storage space
(gains us about 2 TB/box).

Cheers

Carsten
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-14 Thread Thomas Maier-Komor
Carsten Aulbert schrieb:
 Hi Thomas,
 
 Thomas Maier-Komor wrote:
 
 Carsten,

 the summary looks like you are using mbuffer. Can you elaborate on what
 options you are passing to mbuffer? Maybe changing the blocksize to be
 consistent with the recordsize of the zpool could improve performance.
 Is the buffer running full or is it empty most of the time? Are you sure
 that the network connection is 10Gb/s all the way through from machine
 to machine?
 
 Well spotted :)
 
 right now plain mbuffer with plenty of buffer (-m 2048M) on both ends
 and I have not seen any buffer exceeding the 10% watermark level. The
 network connection are via Neterion XFrame II Sun Fire NICs then via CX4
 cables to our core switch where both boxes are directly connected
 (WovenSystmes EFX1000). netperf tells me that the TCP performance is
 close to 7.5 GBit/s duplex and if I use
 
 cat /dev/zero | mbuffer | socat --- socat | mbuffer  /dev/null
 
 I easily see speeds of about 350-400 MB/s so I think the network is fine.
 
 Cheers
 
 Carsten
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

I don't know socat or what benefit it gives you, but have you tried
using mbuffer to send and receive directly (options -I and -O)?
Additionally, try to set the block size of mbuffer to the recordsize of
zfs (usually 128k):
receiver$ mbuffer -I sender:1 -s 128k -m 2048M | zfs receive
sender$ zfs send blabla | mbuffer -s 128k -m 2048M -O receiver:1

As transmitting from /dev/zero to /dev/null is at a rate of 350MB/s, I
guess, you are really hitting the maximum speed of your zpool. From my
understanding, I'd guess sending is always slower than receiving,
because reads are random and writes are sequential. So it should be
quite normal that mbuffer's buffer doesn't really see a lot of usage.

Cheers,
Thomas
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-14 Thread Carsten Aulbert
Hi again,

Thomas Maier-Komor wrote:
 Carsten Aulbert schrieb:
 Hi Thomas,
 I don't know socat or what benefit it gives you, but have you tried
 using mbuffer to send and receive directly (options -I and -O)?

I thought we tried that in the past and with socat it seemed faster, but
I just made a brief test and I got (/dev/zero - remote /dev/null) 330
MB/s with mbuffer+socat and 430MB/s with mbuffer alone.

 Additionally, try to set the block size of mbuffer to the recordsize of
 zfs (usually 128k):
 receiver$ mbuffer -I sender:1 -s 128k -m 2048M | zfs receive
 sender$ zfs send blabla | mbuffer -s 128k -m 2048M -O receiver:1

We are using 32k since many of our user use tiny files (and then I need
to reduce the buffer size because of this 'funny' error):

mbuffer: fatal: Cannot address so much memory
(32768*65536=21474836481544040742911).

Does this qualify for a bug report?

Thanks for the hint of looking into this again!

Cheers

Carsten
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-14 Thread Thomas Maier-Komor
Carsten Aulbert schrieb:
 Hi again,
 
 Thomas Maier-Komor wrote:
 Carsten Aulbert schrieb:
 Hi Thomas,
 I don't know socat or what benefit it gives you, but have you tried
 using mbuffer to send and receive directly (options -I and -O)?
 
 I thought we tried that in the past and with socat it seemed faster, but
 I just made a brief test and I got (/dev/zero - remote /dev/null) 330
 MB/s with mbuffer+socat and 430MB/s with mbuffer alone.
 
 Additionally, try to set the block size of mbuffer to the recordsize of
 zfs (usually 128k):
 receiver$ mbuffer -I sender:1 -s 128k -m 2048M | zfs receive
 sender$ zfs send blabla | mbuffer -s 128k -m 2048M -O receiver:1
 
 We are using 32k since many of our user use tiny files (and then I need
 to reduce the buffer size because of this 'funny' error):
 
 mbuffer: fatal: Cannot address so much memory
 (32768*65536=21474836481544040742911).
 
 Does this qualify for a bug report?
 
 Thanks for the hint of looking into this again!
 
 Cheers
 
 Carsten
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Yes this qualifies for a bug report. As a workaround for now, you can
compile in 64 bit mode.
I.e.:
$ ./configure CFLAGS=-g -O -m64
$ make  make install

This works for Sun Studio 12 and gcc. For older version of Sun Studio,
you need to pass -xarch=v9 instead of -m64.

I am planning to release an updated version mbuffer this week. I'll
include a patch for this issue.

Cheers,
Thomas
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-13 Thread Darren J Moffat
Carsten Aulbert wrote:
 Hi all,
 
 although I'm running all this in a Sol10u5 X4500, I hope I may ask this
 question here. If not, please let me know where to head to.
 
 We are running several X4500 with only 3 raidz2 zpools since we want
 quite a bit of storage space[*], but the performance we get when using
 zfs send is sometimes really lousy. Of course this depends what's in the
 file system, but when doing a few backups today I have seen the following:
 
 receiving full stream of atlashome/[EMAIL PROTECTED] into
 atlashome/BACKUP/[EMAIL PROTECTED]
 in @ 11.1 MB/s, out @ 11.1 MB/s, 14.9 GB total, buffer   0% full
 summary: 14.9 GByte in 45 min 42.8 sec - average of 5708 kB/s
 
 So, a mere 15 GB were transferred in 45 minutes, another user's home
 which is quite large (7TB) took more than 42 hours to be transferred.
 Since all this is going a 10 Gb/s network and the CPUs are all idle I
 would really like to know why

What are you using to transfer the data over the network ?

-- 
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-13 Thread Thomas Maier-Komor
Carsten Aulbert schrieb:
 Hi all,
 
 although I'm running all this in a Sol10u5 X4500, I hope I may ask this
 question here. If not, please let me know where to head to.
 
 We are running several X4500 with only 3 raidz2 zpools since we want
 quite a bit of storage space[*], but the performance we get when using
 zfs send is sometimes really lousy. Of course this depends what's in the
 file system, but when doing a few backups today I have seen the following:
 
 receiving full stream of atlashome/[EMAIL PROTECTED] into
 atlashome/BACKUP/[EMAIL PROTECTED]
 in @ 11.1 MB/s, out @ 11.1 MB/s, 14.9 GB total, buffer   0% full
 summary: 14.9 GByte in 45 min 42.8 sec - average of 5708 kB/s
 
 So, a mere 15 GB were transferred in 45 minutes, another user's home
 which is quite large (7TB) took more than 42 hours to be transferred.
 Since all this is going a 10 Gb/s network and the CPUs are all idle I
 would really like to know why
 
 * zfs send is so slow and
 * how can I improve the speed?
 
 Thanks a lot for any hint
 
 Cheers
 
 Carsten
 
 [*] we have some quite a few tests with more zpools but were not able to
 improve the speeds substantially. For this particular bad file system I
 still need to histogram the file sizes.
 


Carsten,

the summary looks like you are using mbuffer. Can you elaborate on what
options you are passing to mbuffer? Maybe changing the blocksize to be
consistent with the recordsize of the zpool could improve performance.
Is the buffer running full or is it empty most of the time? Are you sure
that the network connection is 10Gb/s all the way through from machine
to machine?

- Thomas
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-13 Thread Carsten Aulbert
Hi

Darren J Moffat wrote:

 
 What are you using to transfer the data over the network ?
 

Initially just plain ssh which was way to slow, now we use mbuffer on
both ends and socket transfer the data over via socat - I know that
mbuffer already allows this, but in a few tests socat seemed to be faster.

Sorry for not writing this into the first email.

Cheers

Carsten


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-13 Thread Carsten Aulbert
Hi Thomas,

Thomas Maier-Komor wrote:

 
 Carsten,
 
 the summary looks like you are using mbuffer. Can you elaborate on what
 options you are passing to mbuffer? Maybe changing the blocksize to be
 consistent with the recordsize of the zpool could improve performance.
 Is the buffer running full or is it empty most of the time? Are you sure
 that the network connection is 10Gb/s all the way through from machine
 to machine?

Well spotted :)

right now plain mbuffer with plenty of buffer (-m 2048M) on both ends
and I have not seen any buffer exceeding the 10% watermark level. The
network connection are via Neterion XFrame II Sun Fire NICs then via CX4
cables to our core switch where both boxes are directly connected
(WovenSystmes EFX1000). netperf tells me that the TCP performance is
close to 7.5 GBit/s duplex and if I use

cat /dev/zero | mbuffer | socat --- socat | mbuffer  /dev/null

I easily see speeds of about 350-400 MB/s so I think the network is fine.

Cheers

Carsten
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss