Re: [zfs-discuss] zfs send/receive - actual performance

2010-04-01 Thread Richard Elling
On Apr 1, 2010, at 12:43 AM, tomwaters wrote:

> "If you see the workload on the wire go through regular patterns of fast/slow 
> response
> then there are some additional tricks that can be applied to increase the 
> overall
> throughput and smooth the jaggies. But that is fodder for another post..."
> 
> Can you pls. elaborate on what can be done here as I am seeing this.

There are many things that can be done, but no silver bullet.
 -- richard

ZFS storage and performance consulting at http://www.RichardElling.com
ZFS training on deduplication, NexentaStor, and NAS performance
Las Vegas, April 29-30, 2010 http://nexenta-vegas.eventbrite.com 





___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send/receive - actual performance

2010-04-01 Thread tomwaters
"If you see the workload on the wire go through regular patterns of fast/slow 
response
then there are some additional tricks that can be applied to increase the 
overall
throughput and smooth the jaggies. But that is fodder for another post..."

Can you pls. elaborate on what can be done here as I am seeing this.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send/receive - actual performance

2010-03-26 Thread Richard Elling
On Mar 26, 2010, at 2:34 AM, Bruno Sousa wrote:
> Hi,
> 
> The jumbo-frames in my case give me a boost of around 2 mb/s, so it's not 
> that much. 

That is about right.  IIRC, the theoretical max is about 4% improvement, for 
MTU of 8KB.

> Now i will play with link aggregation and see how it goes, and of course i'm 
> counting that incremental replication will be slower...but since the amount 
> of data would be much less probably it will still deliver a good performance.

Probably won't help at all because of the brain dead way link aggregation has to
work.  See "Ordering of frames" at
http://en.wikipedia.org/wiki/Link_Aggregation_Control_Protocol#Link_Aggregation_Control_Protocol

If you see the workload on the wire go through regular patterns of fast/slow 
response
then there are some additional tricks that can be applied to increase the 
overall
throughput and smooth the jaggies. But that is fodder for another post...
You can measure this with iostat using samples < 15 seconds or with tcpstat.
tcpstat is a handy DTrace script often located as /opt/DTT/Bin/tcpstat.d
 -- richard

> And what a relief to know that i'm not alone when i say that storage 
> management is part science, part arts and part "voodoo magic" ;)
> 
> Cheers,
> Bruno
> 
> On 25-3-2010 23:22, Ian Collins wrote:
>> On 03/26/10 10:00 AM, Bruno Sousa wrote:
>> 
>> [Boy top-posting sure mucks up threads!]
>> 
>>> Hi,
>>> 
>>> Indeed the 3 disks per vdev (raidz2) seems a bad idea...but it's the system 
>>> i have now.
>>> Regarding the performance...let's assume that a bonnie++ benchmark could go 
>>> to 200 mg/s in. The possibility of getting the same values (or near) in a 
>>> zfs send / zfs receive is just a matter of putting , let's say a 10gbE card 
>>> between both systems?
>> 
>> Maybe, or a 2x1G LAG would me more cost effective (and easier to check!).  
>> The only way to know for sure is to measure.  I managed to get slightly 
>> better transfers by enabling jumbo frames.
>> 
>>> I have the impression that benchmarks are always synthetic, therefore 
>>> live/production environments behave quite differently.
>> 
>> Very true, especially in the black arts of storage management!
>> 
>>> Again, it might be just me, but with 1gb link being able to replicate 2 
>>> servers with a average speed above 60 mb/s does seems quite good. However, 
>>> like i said i would like to know other results from other guys...
>>> 
>> As I said, the results are typical for a 1G link.  Don't forget you are 
>> measuring full copies, incremental replications may well be significantly 
>> slower.
>> 
>> -- 
>> Ian.
>>   
>> 
>> 
>> -- 
>> This message has been scanned for viruses and 
>> dangerous content by MailScanner, and is 
>> believed to be clean.
> 
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

ZFS storage and performance consulting at http://www.RichardElling.com
ZFS training on deduplication, NexentaStor, and NAS performance
Las Vegas, April 29-30, 2010 http://nexenta-vegas.eventbrite.com 





___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send/receive - actual performance

2010-03-26 Thread Bruno Sousa
Hi,

The jumbo-frames in my case give me a boost of around 2 mb/s, so it's
not that much.
Now i will play with link aggregation and see how it goes, and of course
i'm counting that incremental replication will be slower...but since the
amount of data would be much less probably it will still deliver a good
performance.

And what a relief to know that i'm not alone when i say that storage
management is part science, part arts and part "voodoo magic" ;)

Cheers,
Bruno

On 25-3-2010 23:22, Ian Collins wrote:
> On 03/26/10 10:00 AM, Bruno Sousa wrote:
>
> [Boy top-posting sure mucks up threads!]
>
>> Hi,
>>
>> Indeed the 3 disks per vdev (raidz2) seems a bad idea...but it's the
>> system i have now.
>> Regarding the performance...let's assume that a bonnie++ benchmark
>> could go to 200 mg/s in. The possibility of getting the same values
>> (or near) in a zfs send / zfs receive is just a matter of putting ,
>> let's say a 10gbE card between both systems?
>
> Maybe, or a 2x1G LAG would me more cost effective (and easier to
> check!).  The only way to know for sure is to measure.  I managed to
> get slightly better transfers by enabling jumbo frames.
>
>> I have the impression that benchmarks are always synthetic, therefore
>> live/production environments behave quite differently.
>
> Very true, especially in the black arts of storage management!
>
>> Again, it might be just me, but with 1gb link being able to replicate
>> 2 servers with a average speed above 60 mb/s does seems quite good.
>> However, like i said i would like to know other results from other
>> guys...
>>
> As I said, the results are typical for a 1G link.  Don't forget you
> are measuring full copies, incremental replications may well be
> significantly slower.
>
> -- 
> Ian.
>   
>
> -- 
> This message has been scanned for viruses and
> dangerous content by *MailScanner* , and is
> believed to be clean. 



smime.p7s
Description: S/MIME Cryptographic Signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send/receive - actual performance

2010-03-26 Thread Bruno Sousa
Hi,

I think that in this case the cpu is not the bottleneck, since i'm not
using ssh.
However my 1gb network link probably is the bottleneck.

Bruno

On 26-3-2010 9:25, Erik Ableson wrote:
>
> On 25 mars 2010, at 22:00, Bruno Sousa  wrote:
>
>> Hi,
>>
>> Indeed the 3 disks per vdev (raidz2) seems a bad idea...but it's the
>> system i have now.
>> Regarding the performance...let's assume that a bonnie++ benchmark
>> could go to 200 mg/s in. The possibility of getting the same values
>> (or near) in a zfs send / zfs receive is just a matter of putting ,
>> let's say a 10gbE card between both systems?
>> I have the impression that benchmarks are always synthetic, therefore
>> live/production environments behave quite differently.
>> Again, it might be just me, but with 1gb link being able to replicate
>> 2 servers with a average speed above 60 mb/s does seems quite good.
>> However, like i said i would like to know other results from other
>> guys...
>
> Don't forget to factor in your transport mechanism. If you're using
> ssh to pipe the send/recv data your overall speed may end up being CPU
> bound since I think that ssh will be single threaded so even on a
> multicore system, you'll only be able to consume one core and here raw
> clock speed will make difference.
>
> Cheers,
>
> Erik
>




smime.p7s
Description: S/MIME Cryptographic Signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send/receive - actual performance

2010-03-26 Thread Erik Ableson


On 25 mars 2010, at 22:00, Bruno Sousa  wrote:


Hi,

Indeed the 3 disks per vdev (raidz2) seems a bad idea...but it's the  
system i have now.
Regarding the performance...let's assume that a bonnie++ benchmark  
could go to 200 mg/s in. The possibility of getting the same values  
(or near) in a zfs send / zfs receive is just a matter of putting ,  
let's say a 10gbE card between both systems?
I have the impression that benchmarks are always synthetic,  
therefore live/production environments behave quite differently.
Again, it might be just me, but with 1gb link being able to  
replicate 2 servers with a average speed above 60 mb/s does seems  
quite good. However, like i said i would like to know other results  
from other guys...


Don't forget to factor in your transport mechanism. If you're using  
ssh to pipe the send/recv data your overall speed may end up being CPU  
bound since I think that ssh will be single threaded so even on a  
multicore system, you'll only be able to consume one core and here raw  
clock speed will make difference.


Cheers,

Erik
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send/receive - actual performance

2010-03-25 Thread Ian Collins

On 03/26/10 10:00 AM, Bruno Sousa wrote:

[Boy top-posting sure mucks up threads!]


Hi,

Indeed the 3 disks per vdev (raidz2) seems a bad idea...but it's the 
system i have now.
Regarding the performance...let's assume that a bonnie++ benchmark 
could go to 200 mg/s in. The possibility of getting the same values 
(or near) in a zfs send / zfs receive is just a matter of putting , 
let's say a 10gbE card between both systems?


Maybe, or a 2x1G LAG would me more cost effective (and easier to 
check!).  The only way to know for sure is to measure.  I managed to get 
slightly better transfers by enabling jumbo frames.


I have the impression that benchmarks are always synthetic, therefore 
live/production environments behave quite differently.


Very true, especially in the black arts of storage management!

Again, it might be just me, but with 1gb link being able to replicate 
2 servers with a average speed above 60 mb/s does seems quite good. 
However, like i said i would like to know other results from other guys...


As I said, the results are typical for a 1G link.  Don't forget you are 
measuring full copies, incremental replications may well be 
significantly slower.


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send/receive - actual performance

2010-03-25 Thread Bruno Sousa
Hi,

Indeed the 3 disks per vdev (raidz2) seems a bad idea...but it's the
system i have now.
Regarding the performance...let's assume that a bonnie++ benchmark could
go to 200 mg/s in. The possibility of getting the same values (or near)
in a zfs send / zfs receive is just a matter of putting , let's say a
10gbE card between both systems?
I have the impression that benchmarks are always synthetic, therefore
live/production environments behave quite differently.
Again, it might be just me, but with 1gb link being able to replicate 2
servers with a average speed above 60 mb/s does seems quite good.
However, like i said i would like to know other results from other guys...

Thanks for the time.
Bruno

On 25-3-2010 21:52, Ian Collins wrote:
> On 03/26/10 08:47 AM, Bruno Sousa wrote:
>> Hi all,
>>
>> The more readings i do about ZFS, and experiments the more i like
>> this stack of technologies.
>> Since we all like to see real figures in real environments , i might
>> as well share some of my numbers ..
>> The replication has been achieved with the zfs send / zfs receive but
>> piped with mbuffer (http://www.maier-komor.de/mbuffer.html), during
>> business hours , so it's a live environment and *not *a controlled
>> test environment.
>>
>> storageA
>>
>> opensolaris snv_133
>> 2 quad-core amd
>> 28 gb ram
>>
>> Seagate Barracuda SATA drives 1.5TB 7.200 rpm (ST31500341AS) -
>> *non-enterprise class disks*
>> 1 RAIDZ2 pool with 6 vdevs with 3 disks each connected to a lsi
>> non-raid controller
>>
> As others have already said, raidz2 with 3 drives is Not A Good Idea!
>
>> storageB
>>
>> opensolaris snv_134
>> 2 Intel Xeon 2.0ghz
>> 8 gb ram
>>
>>
>> Seagate Barracuda SATA drives 1TB 7.200 rpm (ST31000640SS) -
>> *enterprise class disks*
>> 1 RAIDZ2 pool with 4 vdevs with 5 disks each connected to a Adaptec
>> RAID controller(52445, 512 mb cache) with read and write cache
>> enabled. The adaptec hba has 20 volumes , where one volume = one
>> drive..something similar to a jbod
>>
>> Both systems are connected to a gigabit switch without vlans (switch
>> is a 3com), and  jumbo-frames are disabled.
>>
>> And now the results :
>>
>> Dataset : around 26.5 gb in files bigger than 256 KB and smaller than 1MB
>>
>> summary: 26.6 GByte in  6 min 20.6 sec - average of *71.7 MB/s*
>>
>> Dataset : around 160gb of data with files small (less than 20 kb) and
>> large (bigger than 10MB)
>>
>> summary:  164 GByte in 34 min 41.9 sec - average of *80.6 MB/s*
>>
>
> Those numbers look right for a 1 Gig link.  Try a tool such as
> bonnie++ to see what the block read and write numbers are for your
> pools and if the they are significantly better than these, try an
> aggregated link between the systems.
> -- 
> Ian.
>   
>
> -- 
> This message has been scanned for viruses and
> dangerous content by *MailScanner* , and is
> believed to be clean. 



smime.p7s
Description: S/MIME Cryptographic Signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send/receive - actual performance

2010-03-25 Thread Ian Collins

On 03/26/10 08:47 AM, Bruno Sousa wrote:

Hi all,

The more readings i do about ZFS, and experiments the more i like this 
stack of technologies.
Since we all like to see real figures in real environments , i might 
as well share some of my numbers ..
The replication has been achieved with the zfs send / zfs receive but 
piped with mbuffer (http://www.maier-komor.de/mbuffer.html), during 
business hours , so it's a live environment and *not *a controlled 
test environment.


storageA

opensolaris snv_133
2 quad-core amd
28 gb ram

Seagate Barracuda SATA drives 1.5TB 7.200 rpm (ST31500341AS) - 
*non-enterprise class disks*
1 RAIDZ2 pool with 6 vdevs with 3 disks each connected to a lsi 
non-raid controller



As others have already said, raidz2 with 3 drives is Not A Good Idea!


storageB

opensolaris snv_134
2 Intel Xeon 2.0ghz
8 gb ram


Seagate Barracuda SATA drives 1TB 7.200 rpm (ST31000640SS) - 
*enterprise class disks*
1 RAIDZ2 pool with 4 vdevs with 5 disks each connected to a Adaptec 
RAID controller(52445, 512 mb cache) with read and write cache 
enabled. The adaptec hba has 20 volumes , where one volume = one 
drive..something similar to a jbod


Both systems are connected to a gigabit switch without vlans (switch 
is a 3com), and  jumbo-frames are disabled.


And now the results :

Dataset : around 26.5 gb in files bigger than 256 KB and smaller than 1MB

summary: 26.6 GByte in  6 min 20.6 sec - average of *71.7 MB/s*

Dataset : around 160gb of data with files small (less than 20 kb) and 
large (bigger than 10MB)


summary:  164 GByte in 34 min 41.9 sec - average of *80.6 MB/s*



Those numbers look right for a 1 Gig link.  Try a tool such as bonnie++ 
to see what the block read and write numbers are for your pools and if 
the they are significantly better than these, try an aggregated link 
between the systems.


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send/receive - actual performance

2010-03-25 Thread Bruno Sousa
Thanks for the tip..btw is there any advantage with jbod vs simple volumes?

Bruno
On 25-3-2010 21:08, Richard Jahnel wrote:
> BTW, if you download the solaris drivers for the 52445 from adaptec, you can 
> use jbod instead of simple volumes.
>   




smime.p7s
Description: S/MIME Cryptographic Signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send/receive - actual performance

2010-03-25 Thread Richard Jahnel
BTW, if you download the solaris drivers for the 52445 from adaptec, you can 
use jbod instead of simple volumes.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zfs send/receive - actual performance

2010-03-25 Thread Bruno Sousa
Hi all,

The more readings i do about ZFS, and experiments the more i like this
stack of technologies.
Since we all like to see real figures in real environments , i might as
well share some of my numbers ..
The replication has been achieved with the zfs send / zfs receive but
piped with mbuffer (http://www.maier-komor.de/mbuffer.html), during
business hours , so it's a live environment and *not *a controlled test
environment.

storageA

opensolaris snv_133
2 quad-core amd
28 gb ram

Seagate Barracuda SATA drives 1.5TB 7.200 rpm (ST31500341AS) -
*non-enterprise class disks*
1 RAIDZ2 pool with 6 vdevs with 3 disks each connected to a lsi non-raid
controller

storageB

opensolaris snv_134
2 Intel Xeon 2.0ghz
8 gb ram


Seagate Barracuda SATA drives 1TB 7.200 rpm (ST31000640SS) - *enterprise
class disks*
1 RAIDZ2 pool with 4 vdevs with 5 disks each connected to a Adaptec RAID
controller(52445, 512 mb cache) with read and write cache enabled. The
adaptec hba has 20 volumes , where one volume = one drive..something
similar to a jbod

Both systems are connected to a gigabit switch without vlans (switch is
a 3com), and  jumbo-frames are disabled.

And now the results :

Dataset : around 26.5 gb in files bigger than 256 KB and smaller than 1MB

summary: 26.6 GByte in  6 min 20.6 sec - average of *71.7 MB/s*

Dataset : around 160gb of data with files small (less than 20 kb) and
large (bigger than 10MB)

summary:  164 GByte in 34 min 41.9 sec - average of *80.6 MB/s*


I don't know what about you...but for me it does seems very , very ,
very good performance :), specially if i consider that in overall these
two systems cost less than 12.000EUR .

Does anyone else has numbers to share?

Bruno





smime.p7s
Description: S/MIME Cryptographic Signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss