Re: [zfs-discuss] NFS performance near zero on a very full pool

2010-09-09 Thread Arne Jansen

Richard Elling wrote:

On Sep 9, 2010, at 10:09 AM, Arne Jansen wrote:


Hi Neil,

Neil Perrin wrote:

NFS often demands it's transactions are stable before returning.
This forces ZFS to do the system call synchronously. Usually the
ZIL (code) allocates and writes a new block in the intent log chain to achieve 
this.
If ever it fails to allocate a block (of the size requested) it it forced
to close the txg containing the system call. Yes this can be extremely
slow but there is no other option for the ZIL. I'm surprised the wait is 30 
seconds.
I would expect mush less, but finding room for the rest of the txg data and 
metadata
would also be a challenge.

I think this is not what we saw, for two reason:
a) we have a mirrored slog device. According to zpool iostat -v only 16MB
   out of 4GB were in use.
b) it didn't seem like the txg would have been closed early. Rather it kept
   approximately the 30 second intervals.

Internally we came up with a different explanation, without any backing that
it might be correct: When the pool reaches 96%, zfs goes into a 'self defense'
mode. Instead of allocating block from ZIL, every write turns synchronous and
has to wait for the txg to finish naturally. The reasoning behind this might
be that even if ZIL is available, there might not be enough space left to commit
the ZIL to the pool. To prevent this, zfs doen't use ZIL when the pool is above
96%. While this might be proper for small pools, on large pools 4% are still
some TB of free space, so there should be an upper limit of maybe 10GB on this
hidden reserve.


I do not believe this is correct.  At 96% the first-fit algorithm changes to 
best-fit
and ganging can be expected. This has nothing to do with the ZIL.  There is 
already a reserve set aside for metadata and the ZIL so that you can remove

files when the file system is 100% full.  This reserve is 32 MB or 1/64 of the 
pool
size.


Maybe it is some side-effect of this change of allocation scheme. But I'm very
sure about what I saw. The change was drastic and abrupt. I had a dtrace script
running that measured the time for rfs3_write to complete. With the pool >96%
I saw a burst of writes every 30 seconds, with completion times of up to 30s.
With the pool < 96%, I saw a continuous stream of writes with completion times
of mostly a few microseconds.




In this situation, not only writes suffered, but as a side effect reads also
came to a nearly complete halt.


If you have atime=on, then reads create writes.


atime is off. The impact on reads/lookups/getattr came imho because all server
threads have been occupied by blocking writes for a prolonged time.

I'll try to reproduce this on a test machine.

--
Arne

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] NFS performance near zero on a very full pool

2010-09-09 Thread Richard Elling
On Sep 9, 2010, at 10:09 AM, Arne Jansen wrote:

> Hi Neil,
> 
> Neil Perrin wrote:
>> NFS often demands it's transactions are stable before returning.
>> This forces ZFS to do the system call synchronously. Usually the
>> ZIL (code) allocates and writes a new block in the intent log chain to 
>> achieve this.
>> If ever it fails to allocate a block (of the size requested) it it forced
>> to close the txg containing the system call. Yes this can be extremely
>> slow but there is no other option for the ZIL. I'm surprised the wait is 30 
>> seconds.
>> I would expect mush less, but finding room for the rest of the txg data and 
>> metadata
>> would also be a challenge.
> 
> I think this is not what we saw, for two reason:
> a) we have a mirrored slog device. According to zpool iostat -v only 16MB
>out of 4GB were in use.
> b) it didn't seem like the txg would have been closed early. Rather it kept
>approximately the 30 second intervals.
> 
> Internally we came up with a different explanation, without any backing that
> it might be correct: When the pool reaches 96%, zfs goes into a 'self defense'
> mode. Instead of allocating block from ZIL, every write turns synchronous and
> has to wait for the txg to finish naturally. The reasoning behind this might
> be that even if ZIL is available, there might not be enough space left to 
> commit
> the ZIL to the pool. To prevent this, zfs doen't use ZIL when the pool is 
> above
> 96%. While this might be proper for small pools, on large pools 4% are still
> some TB of free space, so there should be an upper limit of maybe 10GB on this
> hidden reserve.

I do not believe this is correct.  At 96% the first-fit algorithm changes to 
best-fit
and ganging can be expected. This has nothing to do with the ZIL.  There is 
already a reserve set aside for metadata and the ZIL so that you can remove
files when the file system is 100% full.  This reserve is 32 MB or 1/64 of the 
pool
size.

> Also this sudden switch of behavior is completely unexpected and at least 
> under-
> documented.

Methinks you are just seeing the change in performance from the allocation
algorithm change.

> 
>> Most (maybe all?) file systems perform badly when out of space. I believe we 
>> give a recommended
>> free size and I thought it was 90%.
> 
> In this situation, not only writes suffered, but as a side effect reads also
> came to a nearly complete halt.

If you have atime=on, then reads create writes.
 -- richard

-- 
OpenStorage Summit, October 25-27, Palo Alto, CA
http://nexenta-summit2010.eventbrite.com
ZFS and performance consulting
http://www.RichardElling.com












___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] NFS performance near zero on a very full pool

2010-09-09 Thread Arne Jansen

Hi Neil,

Neil Perrin wrote:

NFS often demands it's transactions are stable before returning.
This forces ZFS to do the system call synchronously. Usually the
ZIL (code) allocates and writes a new block in the intent log chain to 
achieve this.

If ever it fails to allocate a block (of the size requested) it it forced
to close the txg containing the system call. Yes this can be extremely
slow but there is no other option for the ZIL. I'm surprised the wait is 
30 seconds.
I would expect mush less, but finding room for the rest of the txg data 
and metadata

would also be a challenge.


I think this is not what we saw, for two reason:
 a) we have a mirrored slog device. According to zpool iostat -v only 16MB
out of 4GB were in use.
 b) it didn't seem like the txg would have been closed early. Rather it kept
approximately the 30 second intervals.

Internally we came up with a different explanation, without any backing that
it might be correct: When the pool reaches 96%, zfs goes into a 'self defense'
mode. Instead of allocating block from ZIL, every write turns synchronous and
has to wait for the txg to finish naturally. The reasoning behind this might
be that even if ZIL is available, there might not be enough space left to commit
the ZIL to the pool. To prevent this, zfs doen't use ZIL when the pool is above
96%. While this might be proper for small pools, on large pools 4% are still
some TB of free space, so there should be an upper limit of maybe 10GB on this
hidden reserve.
Also this sudden switch of behavior is completely unexpected and at least under-
documented.



Most (maybe all?) file systems perform badly when out of space. I 
believe we give a recommended

free size and I thought it was 90%.


In this situation, not only writes suffered, but as a side effect reads also
came to a nearly complete halt.

--
Arne




Neil.

On 09/09/10 09:00, Arne Jansen wrote:

Hi,

currently I'm trying to debug a very strange phenomenon on a nearly full
pool (96%). Here are the symptoms: over NFS, a find on the pool takes
a very long time, up to 30s (!) for each file. Locally, the performance
is quite normal.
What I found out so far: It seems that every nfs write (rfs3_write) 
blocks
until the txg is flushed. This means a write takes up to 30 seconds. 
During
this time, the nfs calls block, occupying all NFS server threads. With 
all

server threads blocked, all other OPs (LOOKUP, GETATTR, ...) have to wait
until the writes finish, bringing the performance of the server 
effectively

down to zero.
It may be that the trigger for this behavior is around 95%. I managed 
to bring
the pool down to 95%, now the writes get served continuously as it 
should be.


What is the explanation for this behaviour? Is it intentional and can the
threshold be tuned? I experienced this on Sol10 U8.

Thanks,
Arne
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
  




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] NFS performance near zero on a very full pool

2010-09-09 Thread Neil Perrin

I should also have mentioned that if the pool has a separate log device
then this shouldn't happen.Assuming the slog is big enough then it
it should have enough blocks to not be forced into using main pool 
device blocks.


Neil.

On 09/09/10 10:36, Neil Perrin wrote:

Arne,

NFS often demands it's transactions are stable before returning.
This forces ZFS to do the system call synchronously. Usually the
ZIL (code) allocates and writes a new block in the intent log chain to 
achieve this.

If ever it fails to allocate a block (of the size requested) it it forced
to close the txg containing the system call. Yes this can be extremely
slow but there is no other option for the ZIL. I'm surprised the wait 
is 30 seconds.
I would expect mush less, but finding room for the rest of the txg 
data and metadata

would also be a challenge.

Most (maybe all?) file systems perform badly when out of space. I 
believe we give a recommended

free size and I thought it was 90%.

Neil.

On 09/09/10 09:00, Arne Jansen wrote:

Hi,

currently I'm trying to debug a very strange phenomenon on a nearly full
pool (96%). Here are the symptoms: over NFS, a find on the pool takes
a very long time, up to 30s (!) for each file. Locally, the performance
is quite normal.
What I found out so far: It seems that every nfs write (rfs3_write) 
blocks
until the txg is flushed. This means a write takes up to 30 seconds. 
During
this time, the nfs calls block, occupying all NFS server threads. 
With all
server threads blocked, all other OPs (LOOKUP, GETATTR, ...) have to 
wait
until the writes finish, bringing the performance of the server 
effectively

down to zero.
It may be that the trigger for this behavior is around 95%. I managed 
to bring
the pool down to 95%, now the writes get served continuously as it 
should be.


What is the explanation for this behaviour? Is it intentional and can 
the

threshold be tuned? I experienced this on Sol10 U8.

Thanks,
Arne
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
  


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] NFS performance near zero on a very full pool

2010-09-09 Thread Neil Perrin

Arne,

NFS often demands it's transactions are stable before returning.
This forces ZFS to do the system call synchronously. Usually the
ZIL (code) allocates and writes a new block in the intent log chain to 
achieve this.

If ever it fails to allocate a block (of the size requested) it it forced
to close the txg containing the system call. Yes this can be extremely
slow but there is no other option for the ZIL. I'm surprised the wait is 
30 seconds.
I would expect mush less, but finding room for the rest of the txg data 
and metadata

would also be a challenge.

Most (maybe all?) file systems perform badly when out of space. I 
believe we give a recommended

free size and I thought it was 90%.

Neil.

On 09/09/10 09:00, Arne Jansen wrote:

Hi,

currently I'm trying to debug a very strange phenomenon on a nearly full
pool (96%). Here are the symptoms: over NFS, a find on the pool takes
a very long time, up to 30s (!) for each file. Locally, the performance
is quite normal.
What I found out so far: It seems that every nfs write (rfs3_write) blocks
until the txg is flushed. This means a write takes up to 30 seconds. During
this time, the nfs calls block, occupying all NFS server threads. With all
server threads blocked, all other OPs (LOOKUP, GETATTR, ...) have to wait
until the writes finish, bringing the performance of the server effectively
down to zero.
It may be that the trigger for this behavior is around 95%. I managed to bring
the pool down to 95%, now the writes get served continuously as it should be.

What is the explanation for this behaviour? Is it intentional and can the
threshold be tuned? I experienced this on Sol10 U8.

Thanks,
Arne
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
  


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] NFS performance near zero on a very full pool

2010-09-09 Thread Arne Jansen
Hi,

currently I'm trying to debug a very strange phenomenon on a nearly full
pool (96%). Here are the symptoms: over NFS, a find on the pool takes
a very long time, up to 30s (!) for each file. Locally, the performance
is quite normal.
What I found out so far: It seems that every nfs write (rfs3_write) blocks
until the txg is flushed. This means a write takes up to 30 seconds. During
this time, the nfs calls block, occupying all NFS server threads. With all
server threads blocked, all other OPs (LOOKUP, GETATTR, ...) have to wait
until the writes finish, bringing the performance of the server effectively
down to zero.
It may be that the trigger for this behavior is around 95%. I managed to bring
the pool down to 95%, now the writes get served continuously as it should be.

What is the explanation for this behaviour? Is it intentional and can the
threshold be tuned? I experienced this on Sol10 U8.

Thanks,
Arne
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] NFS performance issue

2010-09-08 Thread Ray Van Dolson
On Wed, Sep 08, 2010 at 01:20:58PM -0700, Dr. Martin Mundschenk wrote:
> Hi!
> 
> I searched the web for hours, trying to solve the NFS/ZFS low
> performance issue on my just setup OSOL box (snv134). The problem is
> discussed in many threads but I've found no solution. 
> 
> On a nfs shared volume, I get write performance of 3,5M/sec (!!) read
> performance is about 50M/sec which is ok but on a GBit network, more
> should be possible, since the servers disk performance reaches up to
> 120 M/sec.
> 
> Does anyone have a solution how I can at least speed up the writes?

What's the write workload like?  You could try disabling the ZIL to see
if that makes a difference.  If it does, the addition of an SSD-based
ZIL / slog device would most certainly help.

Maybe you could describe the makeup of your zpool as well?

Ray
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] NFS performance issue

2010-09-08 Thread Dr. Martin Mundschenk
Hi!

I searched the web for hours, trying to solve the NFS/ZFS low performance issue 
on my just setup OSOL box (snv134). The problem is discussed in many threads 
but I've found no solution. 

On a nfs shared volume, I get write performance of 3,5M/sec (!!) read 
performance is about 50M/sec which is ok but on a GBit network, more should be 
possible, since the servers disk performance reaches up to 120 M/sec.

Does anyone have a solution how I can at least speed up the writes?

Martin
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] NFS performance?

2010-07-26 Thread Mike Gerdts
On Mon, Jul 26, 2010 at 2:56 PM, Miles Nordin  wrote:
>> "mg" == Mike Gerdts  writes:
>    mg> it is rather common to have multiple 1 Gb links to
>    mg> servers going to disparate switches so as to provide
>    mg> resilience in the face of switch failures.  This is not unlike
>    mg> (at a block diagram level) the architecture that you see in
>    mg> pretty much every SAN.  In such a configuation, it is
>    mg> reasonable for people to expect that load balancing will
>    mg> occur.
>
> nope.  spanning tree removes all loops, which means between any two
> points there will be only one enabled path.  An L2-switched network
> will look into L4 headers for splitting traffic across an aggregated
> link (as long as it's been deliberately configured to do that---by
> default probably only looks to L2), but it won't do any multipath
> within the mesh.

I was speaking more of IPMP, which is at layer 3.

> Even with an L3 routing protocol it usually won't do multipath unless
> the costs of the paths match exactly, so you'd want to build the
> topology to achieve this and then do all switching at layer 3 by
> making sure no VLAN is larger than a switch.

By default, IPMP does outbound load spreading.  Inbound load spreading
is not practical with a single (non-test) IP address.  If you have
multiple virtual IP's you can spread them across all of the NICs in
the IPMP group and get some degree of inbound spreading as well.  This
is the default behavior of the OpenSolaris IPMP implementation, last I
looked.  I've not seen any examples (although I can't say I've looked
real hard either) of the Solaris 10 IPMP configuration set up with
multipe IP's to encourage inbound load spreading as well.

>
> There's actually a cisco feature to make no VLAN larger than a *port*,
> which I use a little bit.  It's meant for CATV networks I think, or
> DSL networks aggregated by IP instead of ATM like maybe some European
> ones?  but the idea is not to put edge ports into vlans any more but
> instead say 'ip unnumbered loopbackN', and then some black magic they
> have built into their DHCP forwarder adds /32 routes by watching the
> DHCP replies.  If you don't use DHCP you can add static /32 routes
> yourself, and it will work.  It does not help with IPv6, and also you
> can only use it on vlan-tagged edge ports (what? arbitrary!) but
> neat that it's there at all.
>
>  http://www.cisco.com/en/US/docs/ios/12_3t/12_3t4/feature/guide/gtunvlan.html

Interesting... however this seems to limit you to < 4096 edge ports
per VTP domain, as the VID field in the 802.1q header is only 12 bits.
 It is also unclear how this works when you have one physical host
with many guests.  And then there is the whole thing that I don't
really see how this helps with resilience in the face of a switch
failure.  Cool technology, but I'm not certain that it addresses what
I was talking about.

>
> The best thing IMHO would be to use this feature on the edge ports,
> just as I said, but you will have to teach the servers to VLAN-tag
> their packets.  not such a bad idea, but weird.
>
> You could also use it one hop up from the edge switches, but I think
> it might have problems in general removing the routes when you unplug
> a server, and using it one hop up could make them worse.  I only use
> it with static routes so far, so no mobility for me: I have to keep
> each server plugged into its assigned port, and reconfigure switches
> if I move it.  Once you have ``no vlan larger than 1 switch,'' if you
> actually need a vlan-like thing that spans multiple switches, the new
> word for it is 'vrf'.

There was some other Cisco dark magic that our network guys were
touting a while ago that would make each edge switch look like a blade
in a 6500 series.  This would then allow them to do link aggregation
across edge switches.  At least two of "organizational changes",
"personnel changes", and "roadmap changes" happened so I've not seen
this in action.

>
> so, yeah, it means the server people will have to take over the job of
> the networking people.  The good news is that networking people don't
> like spanning tree very much because it's always going wrong, so
> AFAICT most of them who are paying attention are already moving in
> this direction.
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
>



-- 
Mike Gerdts
http://mgerdts.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] NFS performance?

2010-07-26 Thread Miles Nordin
> "mg" == Mike Gerdts  writes:
> "sw" == Saxon, Will  writes:

sw> I think there may be very good reason to use iSCSI, if you're
sw> limited to gigabit but need to be able to handle higher
sw> throughput for a single client.

 http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6817942

look at it now before it gets pulled back inside the wall. :(

I think this bug was posted on zfs-discuss earlier.  Please see the
comments because he is not using lagg's: even with a single 10Gbit/s
NIC, you cannot use the link well unless you take advantage of the
multiple MSI's and L4 preclass built into the NIC.  You need multiple
TCP circuits between client and server so that each will fire a
different MSI.  He got about 3x performance using 8 connections.

It sounds like NFS is already fixed for this, but requires manual
tuning of clnt_max_conns and the number of reader and writer threads.

mg> it is rather common to have multiple 1 Gb links to
mg> servers going to disparate switches so as to provide
mg> resilience in the face of switch failures.  This is not unlike
mg> (at a block diagram level) the architecture that you see in
mg> pretty much every SAN.  In such a configuation, it is
mg> reasonable for people to expect that load balancing will
mg> occur.

nope.  spanning tree removes all loops, which means between any two
points there will be only one enabled path.  An L2-switched network
will look into L4 headers for splitting traffic across an aggregated
link (as long as it's been deliberately configured to do that---by
default probably only looks to L2), but it won't do any multipath
within the mesh.

Even with an L3 routing protocol it usually won't do multipath unless
the costs of the paths match exactly, so you'd want to build the
topology to achieve this and then do all switching at layer 3 by
making sure no VLAN is larger than a switch.

There's actually a cisco feature to make no VLAN larger than a *port*,
which I use a little bit.  It's meant for CATV networks I think, or
DSL networks aggregated by IP instead of ATM like maybe some European
ones?  but the idea is not to put edge ports into vlans any more but
instead say 'ip unnumbered loopbackN', and then some black magic they
have built into their DHCP forwarder adds /32 routes by watching the
DHCP replies.  If you don't use DHCP you can add static /32 routes
yourself, and it will work.  It does not help with IPv6, and also you
can only use it on vlan-tagged edge ports (what? arbitrary!) but
neat that it's there at all.

 http://www.cisco.com/en/US/docs/ios/12_3t/12_3t4/feature/guide/gtunvlan.html

The best thing IMHO would be to use this feature on the edge ports,
just as I said, but you will have to teach the servers to VLAN-tag
their packets.  not such a bad idea, but weird.

You could also use it one hop up from the edge switches, but I think
it might have problems in general removing the routes when you unplug
a server, and using it one hop up could make them worse.  I only use
it with static routes so far, so no mobility for me: I have to keep
each server plugged into its assigned port, and reconfigure switches
if I move it.  Once you have ``no vlan larger than 1 switch,'' if you
actually need a vlan-like thing that spans multiple switches, the new
word for it is 'vrf'.

so, yeah, it means the server people will have to take over the job of
the networking people.  The good news is that networking people don't
like spanning tree very much because it's always going wrong, so
AFAICT most of them who are paying attention are already moving in
this direction.


pgpEDdDjwl9Ck.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] NFS performance?

2010-07-26 Thread Mike Gerdts
On Mon, Jul 26, 2010 at 1:27 AM, Garrett D'Amore  wrote:
> On Sun, 2010-07-25 at 21:39 -0500, Mike Gerdts wrote:
>> On Sun, Jul 25, 2010 at 8:50 PM, Garrett D'Amore  wrote:
>> > On Sun, 2010-07-25 at 17:53 -0400, Saxon, Will wrote:
>> >>
>> >> I think there may be very good reason to use iSCSI, if you're limited
>> >> to gigabit but need to be able to handle higher throughput for a
>> >> single client. I may be wrong, but I believe iSCSI to/from a single
>> >> initiator can take advantage of multiple links in an active-active
>> >> multipath scenario whereas NFS is only going to be able to take
>> >> advantage of 1 link (at least until pNFS).
>> >
>> > There are other ways to get multiple paths.  First off, there is IP
>> > multipathing. which offers some of this at the IP layer.  There is also
>> > 802.3ad link aggregation (trunking).  So you can still get high
>> > performance beyond  single link with NFS.  (It works with iSCSI too,
>> > btw.)
>>
>> With both IPMP and link aggregation, each TCP session will go over the
>> same wire.  There is no guarantee that load will be evenly balanced
>> between links when there are multiple TCP sessions.  As such, any
>> scalability you get using these configurations will be dependent on
>> having a complex enough workload, wise cconfiguration choices, and and
>> a bit of luck.
>
> If you're really that concerned, you could use UDP instead of TCP.  But
> that may have other detrimental performance impacts, I'm not sure how
> bad they would be in a data center with generally lossless ethernet
> links.

Heh.  My horror story with reassembly was actually with connectionless
transports (LLT, then UDP).  Oracle RAC's cache fusion sends 8 KB
blocks via UDP by default, or LLT when used in the Veritas + Oracle
RAC certified configuration from 5+ years ago.  The use of Sun
trunking with round robin hashing and the lack of use of jumbo packets
made every cache fusion block turn into 6 LLT or UDP packets that had
to be reassembled on the other end.  This was on a 15K domain with the
NICs spread across IO boards.  I assume that interrupts for a NIC are
handled by a CPU on the closest system board (Solaris 8, FWIW).  If
that assumption is true then there would also be a flurry of
inter-system board chatter to put the block back together.  In any
case, performance was horrible until we got rid of round robin and
enabled jumbo frames.

> Btw, I am not certain that the multiple initiator support (mpxio) is
> necessarily any better as far as guaranteed performance/balancing.  (It
> may be; I've not looked closely enough at it.)

I haven't paid close attention to how mpxio works.  The Veritas
analog, vxdmp, does a very good job of balancing traffic down multiple
paths, even when only a single LUN is accessed.  The exact mode that
dmp will use is dependent on the capabilities of the array it is
talking to - many arrays work in an active/passive mode.  As such, I
would expect that with vxdmp or mpxio the balancing with iSCSI would
be at least partially dependent on what the array said to do.

> I should look more closely at NFS as well -- if multiple applications on
> the same client are access the same filesystem, do they use a single
> common TCP session, or can they each have separate instances open?
> Again, I'm not sure.

It's worse than that.  A quick experiment with two different
automounted home directories from the same NFS server suggests that
both home directories share one TCP session to the NFS server.

The latest version of Oracle's RDBMS supports a userland NFS client
option.  It would be very interesting to see if this does a separate
session per data file, possibly allowing for better load spreading.

>> Note that with Sun Trunking there was an option to load balance using
>> a round robin hashing algorithm.  When pushing high network loads this
>> may cause performance problems with reassembly.
>
> Yes.  Reassembly is Evil for TCP performance.
>
> Btw, the iSCSI balancing act that was described does seem a bit
> contrived -- a single initiator and a COMSTAR server, both client *and
> server* with multiple ethernet links instead of a single 10GbE link.
>
> I'm not saying it doesn't happen, but I think it happens infrequently
> enough that its reasonable that this scenario wasn't one that popped
> immediately into my head. :-)

It depends on whether the people that control the network gear are the
same ones that control servers.  My experience suggests that if there
is a disconnect, it seems rather likely that each group's
standardization efforts, procurement cycles, and capacity plans will
work against any attempt to have an optimal configuration.

Also, it is rather common to have multiple 1 Gb links to servers going
to disparate switches so as to provide resilience in the face of
switch failures.  This is not unlike (at a block diagram level) the
architecture that you see in pretty much every SAN.  In such a
configuation, it is reasonable for people to expect that lo

Re: [zfs-discuss] NFS performance?

2010-07-25 Thread Garrett D'Amore
On Sun, 2010-07-25 at 21:39 -0500, Mike Gerdts wrote:
> On Sun, Jul 25, 2010 at 8:50 PM, Garrett D'Amore  wrote:
> > On Sun, 2010-07-25 at 17:53 -0400, Saxon, Will wrote:
> >>
> >> I think there may be very good reason to use iSCSI, if you're limited
> >> to gigabit but need to be able to handle higher throughput for a
> >> single client. I may be wrong, but I believe iSCSI to/from a single
> >> initiator can take advantage of multiple links in an active-active
> >> multipath scenario whereas NFS is only going to be able to take
> >> advantage of 1 link (at least until pNFS).
> >
> > There are other ways to get multiple paths.  First off, there is IP
> > multipathing. which offers some of this at the IP layer.  There is also
> > 802.3ad link aggregation (trunking).  So you can still get high
> > performance beyond  single link with NFS.  (It works with iSCSI too,
> > btw.)
> 
> With both IPMP and link aggregation, each TCP session will go over the
> same wire.  There is no guarantee that load will be evenly balanced
> between links when there are multiple TCP sessions.  As such, any
> scalability you get using these configurations will be dependent on
> having a complex enough workload, wise cconfiguration choices, and and
> a bit of luck.

If you're really that concerned, you could use UDP instead of TCP.  But
that may have other detrimental performance impacts, I'm not sure how
bad they would be in a data center with generally lossless ethernet
links.

Btw, I am not certain that the multiple initiator support (mpxio) is
necessarily any better as far as guaranteed performance/balancing.  (It
may be; I've not looked closely enough at it.)

I should look more closely at NFS as well -- if multiple applications on
the same client are access the same filesystem, do they use a single
common TCP session, or can they each have separate instances open?
Again, I'm not sure.

> 
> Note that with Sun Trunking there was an option to load balance using
> a round robin hashing algorithm.  When pushing high network loads this
> may cause performance problems with reassembly.

Yes.  Reassembly is Evil for TCP performance.

Btw, the iSCSI balancing act that was described does seem a bit
contrived -- a single initiator and a COMSTAR server, both client *and
server* with multiple ethernet links instead of a single 10GbE link.

I'm not saying it doesn't happen, but I think it happens infrequently
enough that its reasonable that this scenario wasn't one that popped
immediately into my head. :-)

- Garrett


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] NFS performance?

2010-07-25 Thread Mike Gerdts
On Sun, Jul 25, 2010 at 8:50 PM, Garrett D'Amore  wrote:
> On Sun, 2010-07-25 at 17:53 -0400, Saxon, Will wrote:
>>
>> I think there may be very good reason to use iSCSI, if you're limited
>> to gigabit but need to be able to handle higher throughput for a
>> single client. I may be wrong, but I believe iSCSI to/from a single
>> initiator can take advantage of multiple links in an active-active
>> multipath scenario whereas NFS is only going to be able to take
>> advantage of 1 link (at least until pNFS).
>
> There are other ways to get multiple paths.  First off, there is IP
> multipathing. which offers some of this at the IP layer.  There is also
> 802.3ad link aggregation (trunking).  So you can still get high
> performance beyond  single link with NFS.  (It works with iSCSI too,
> btw.)

With both IPMP and link aggregation, each TCP session will go over the
same wire.  There is no guarantee that load will be evenly balanced
between links when there are multiple TCP sessions.  As such, any
scalability you get using these configurations will be dependent on
having a complex enough workload, wise cconfiguration choices, and and
a bit of luck.

Note that with Sun Trunking there was an option to load balance using
a round robin hashing algorithm.  When pushing high network loads this
may cause performance problems with reassembly.

-- 
Mike Gerdts
http://mgerdts.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] NFS performance?

2010-07-25 Thread Garrett D'Amore
On Sun, 2010-07-25 at 17:53 -0400, Saxon, Will wrote:
> 
> I think there may be very good reason to use iSCSI, if you're limited
> to gigabit but need to be able to handle higher throughput for a
> single client. I may be wrong, but I believe iSCSI to/from a single
> initiator can take advantage of multiple links in an active-active
> multipath scenario whereas NFS is only going to be able to take
> advantage of 1 link (at least until pNFS). 

There are other ways to get multiple paths.  First off, there is IP
multipathing. which offers some of this at the IP layer.  There is also
802.3ad link aggregation (trunking).  So you can still get high
performance beyond  single link with NFS.  (It works with iSCSI too,
btw.)

-- Garrett


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] NFS performance?

2010-07-25 Thread Saxon, Will
> On Fri, 2010-07-23 at 22:20 -0400, Edward Ned Harvey wrote:
> > > From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> > > boun...@opensolaris.org] On Behalf Of Linder, Doug
> > > 
> > > On a related note - all other things being equal, is 
> there any reason
> > > to choose NFS over ISCI, or vice-versa?  I'm currently 
> looking at this
> > 
> > iscsi and NFS are completely different technologies.  If 
> you use iscsi, then
> > all the initiators (clients) are the things which format 
> and control the
> > filesystem.  So the limitations of the filesystem are determined by
> > whichever clustering filesystem you've chosen to implement. 
>  It probably
> > won't do snapshots and so forth.  Although the ZFS 
> filesystem could make a
> > snapshot, it wouldn't be automatically mounted or made 
> available without the
> > clients doing explicit mounts...
> > 
> > With NFS, the filesystem is formatted and controlled by the 
> server.  Both
> > WAFL and ZFS do some pretty good things with snapshotting, 
> and making
> > snapshots available to users without any effort.
> > 
> > ___
> > zfs-discuss mailing list
> > zfs-discuss@opensolaris.org
> > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> > 
> 
> 
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> 

> -Original Message-
> From: zfs-discuss-boun...@opensolaris.org 
> [mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of 
> Garrett D'Amore
> Sent: Friday, July 23, 2010 11:46 PM
> To: Edward Ned Harvey
> Cc: zfs-discuss@opensolaris.org
> Subject: Re: [zfs-discuss] NFS performance?
> 
> Fundamentally, my recommendation is to choose NFS if your clients can
> use it.  You'll get a lot of potential advantages in the NFS/zfs
> integration, so better performance.  Plus you can serve multiple
> clients, etc.
> 
> The only reason to use iSCSI is when you don't have a choice, 
> IMO.  You
> should only use iSCSI with a single initiator at any point in time
> unless you have some higher level contention management in place.
> 
>   - Garrett
> 
> 

I think there may be very good reason to use iSCSI, if you're limited to 
gigabit but need to be able to handle higher throughput for a single client. I 
may be wrong, but I believe iSCSI to/from a single initiator can take advantage 
of multiple links in an active-active multipath scenario whereas NFS is only 
going to be able to take advantage of 1 link (at least until pNFS). 

-Will
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] NFS performance?

2010-07-25 Thread Sigbjørn Lie



From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
boun...@opensolaris.org] On Behalf Of Sigbjorn Lie

What about mirroring? Do I need mirrored ZIL devices in case of a power
outage?



You don't need mirroring for the sake of *power outage* but you *do* need
mirroring for the sake of preventing data loss when one of the SSD devices
fails.  There is some gray area here:

If you have zpool < 19, then you do not have "log device removal" which
means you lose your whole zpool in the event of a failed unmirrored log
device.  (Techniques exist to recover, but it's not always easy.)

If you have zpool >= 19, then the danger is much smaller.  If you have a
failed unmirrored log device, and the failure is detected, then the log
device is simply marked "failed" and the system slows down, and everything
is fine.  But if you have an undetected failure, *and* an ungraceful reboot
(which is more likely than it seems) then you risk up to 30 sec of data that
was intended to be written, immediately before the crash.

None of that is a concern, if you have a mirrored log device.

  


Ah, I see! Thanks.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] NFS performance?

2010-07-24 Thread Garrett D'Amore
On Sat, 2010-07-24 at 19:54 -0400, Edward Ned Harvey wrote:
> > From: Garrett D'Amore [mailto:garr...@nexenta.com]
> > 
> > Fundamentally, my recommendation is to choose NFS if your clients can
> > use it.  You'll get a lot of potential advantages in the NFS/zfs
> > integration, so better performance.  Plus you can serve multiple
> > clients, etc.
> > 
> > The only reason to use iSCSI is when you don't have a choice, IMO.  You
> > should only use iSCSI with a single initiator at any point in time
> > unless you have some higher level contention management in place.
> 
> So ... You don't think filesystems like gfs etc, should ever be used?

"gfs" provides such higher level contention management.  I can't speak
for it myself, but my gut reaction is that unless you have a need for
the features of gfs, you are probably better served by NFS.

Running a more traditional filesystem (that does not allow concurrent
block device access) is almost certainly a bad idea unless you have
special needs.

- Garrett



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] NFS performance?

2010-07-24 Thread Edward Ned Harvey
> From: Garrett D'Amore [mailto:garr...@nexenta.com]
> 
> Fundamentally, my recommendation is to choose NFS if your clients can
> use it.  You'll get a lot of potential advantages in the NFS/zfs
> integration, so better performance.  Plus you can serve multiple
> clients, etc.
> 
> The only reason to use iSCSI is when you don't have a choice, IMO.  You
> should only use iSCSI with a single initiator at any point in time
> unless you have some higher level contention management in place.

So ... You don't think filesystems like gfs etc, should ever be used?

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] NFS performance?

2010-07-23 Thread Garrett D'Amore
Fundamentally, my recommendation is to choose NFS if your clients can
use it.  You'll get a lot of potential advantages in the NFS/zfs
integration, so better performance.  Plus you can serve multiple
clients, etc.

The only reason to use iSCSI is when you don't have a choice, IMO.  You
should only use iSCSI with a single initiator at any point in time
unless you have some higher level contention management in place.

- Garrett


On Fri, 2010-07-23 at 22:20 -0400, Edward Ned Harvey wrote:
> > From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> > boun...@opensolaris.org] On Behalf Of Linder, Doug
> > 
> > On a related note - all other things being equal, is there any reason
> > to choose NFS over ISCI, or vice-versa?  I'm currently looking at this
> 
> iscsi and NFS are completely different technologies.  If you use iscsi, then
> all the initiators (clients) are the things which format and control the
> filesystem.  So the limitations of the filesystem are determined by
> whichever clustering filesystem you've chosen to implement.  It probably
> won't do snapshots and so forth.  Although the ZFS filesystem could make a
> snapshot, it wouldn't be automatically mounted or made available without the
> clients doing explicit mounts...
> 
> With NFS, the filesystem is formatted and controlled by the server.  Both
> WAFL and ZFS do some pretty good things with snapshotting, and making
> snapshots available to users without any effort.
> 
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> 


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] NFS performance?

2010-07-23 Thread Edward Ned Harvey
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Linder, Doug
> 
> On a related note - all other things being equal, is there any reason
> to choose NFS over ISCI, or vice-versa?  I'm currently looking at this

iscsi and NFS are completely different technologies.  If you use iscsi, then
all the initiators (clients) are the things which format and control the
filesystem.  So the limitations of the filesystem are determined by
whichever clustering filesystem you've chosen to implement.  It probably
won't do snapshots and so forth.  Although the ZFS filesystem could make a
snapshot, it wouldn't be automatically mounted or made available without the
clients doing explicit mounts...

With NFS, the filesystem is formatted and controlled by the server.  Both
WAFL and ZFS do some pretty good things with snapshotting, and making
snapshots available to users without any effort.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] NFS performance?

2010-07-23 Thread Linder, Doug
Phil Harmon wrote:

> > Not the thread hijack, but I assume a SSD ZIL will similarly improve
> an iSCSI target...as I am getting 2-5MB on that too.
> 
> Yes, it generally will. I've seen some huge improvements with iSCSI,
> but YMMV depending on your config, application and workload.


Sorry this isn't completely ZFS-related, but with all this expert storage 
knowledge here...

On a related note - all other things being equal, is there any reason to choose 
NFS over ISCI, or vice-versa?  I'm currently looking at this decision.  We have 
a NetApp (I wish it were a ZFS-based appliance!) and need to remotely mount a 
filesystem from it.  It will share the filesystem either as NFS or ICSI.  Some 
of my colleagues say it would be better to use NFS.  Their reasoning is 
basically: "That's the way it's always been done".  I'm leaning towards ISCSI.  
My reasoning is that it removes a whole extra layer of complexity - as I 
understand it, the remote client just treats the remote mount like any other 
physical device.  And I've had MAJOR headaches over the years fixing/tweaking 
NFS.  Even though version 4 seems better, I'd still rather bypass it 
completely.  I believe in the "Keep it simple, stupid" philosophy.

I do realize that NFS is probably better for remote filesystems that have 
multiple simultaneous users, but we won't be doing that in this case.

Any major arguments for/against one over the other?

Thanks for any suggestions.

Doug Linder
--
Learn more about Merchant Link at www.merchantlink.com.

THIS MESSAGE IS CONFIDENTIAL.  This e-mail message and any attachments are 
proprietary and confidential information intended only for the use of the 
recipient(s) named above.  If you are not the intended recipient, you may not 
print, distribute, or copy this message or any attachments.  If you have 
received this communication in error, please notify the sender by return e-mail 
and delete this message and any attachments from your computer.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] NFS performance?

2010-07-23 Thread Andrew Gabriel

Edward Ned Harvey wrote:

From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
boun...@opensolaris.org] On Behalf Of Phil Harman

Milkowski and Neil Perrin's zil synchronicity [PSARC/2010/108] changes
with sync=disabled, when the changes work their way into an available

The fact that people run unsafe systems seemingly without complaint for
years assumes that they know silent data corruption when they
see^H^H^Hhear it ... which, of course, they didn't ... because it is
silent ... or having encountered corrupted data, that they have the
faintest idea where it came from. In my day to day work I still find
many people that have been (apparently) very lucky.



Running with sync disabled, or ZIL disabled, you could call "unsafe" if you
want to use a generalization and a stereotype.  


Just like people say "writeback" is unsafe.  If you apply a little more
intelligence, you'll know, it's safe in some conditions, and not in other
conditions.  Like ... If you have a BBU, you can use your writeback safely.
And if you're not sharing stuff across the network, you're guaranteed the
disabled ZIL is safe.  But even when you are sharing stuff across the
network, the disabled ZIL can still be safe under the following conditions:

If you are only doing file sharing (NFS, CIFS) and you are willing to
reboot/remount from all your clients after an ungraceful shutdown of your
server, then it's safe to run with ZIL disabled.
  


No, that's not safe. The client can still lose up to 30 seconds of data, 
which could be, for example, an email message which is received and 
foldered on the server, and is then lost. It's probably /*safe enough*/ 
for most home users, but you should be fully aware of the potential 
implications before embarking on this route.


(As I said before, the zpool itself is not at any additional risk of 
corruption, it's just that you might find the zfs filesystems with 
sync=disabled appear to have been rewound by up to 30 seconds.)



If you're unsure, then adding SSD nonvolatile log device, as people have
said, is the way to go.
  


--
Andrew Gabriel
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] NFS performance?

2010-07-23 Thread Edward Ned Harvey
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Sigbjorn Lie
> 
> What about mirroring? Do I need mirrored ZIL devices in case of a power
> outage?

You don't need mirroring for the sake of *power outage* but you *do* need
mirroring for the sake of preventing data loss when one of the SSD devices
fails.  There is some gray area here:

If you have zpool < 19, then you do not have "log device removal" which
means you lose your whole zpool in the event of a failed unmirrored log
device.  (Techniques exist to recover, but it's not always easy.)

If you have zpool >= 19, then the danger is much smaller.  If you have a
failed unmirrored log device, and the failure is detected, then the log
device is simply marked "failed" and the system slows down, and everything
is fine.  But if you have an undetected failure, *and* an ungraceful reboot
(which is more likely than it seems) then you risk up to 30 sec of data that
was intended to be written, immediately before the crash.

None of that is a concern, if you have a mirrored log device.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] NFS performance?

2010-07-23 Thread Edward Ned Harvey
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Sigbjorn Lie
> 
> What size of ZIL device would be recommened for my pool consisting for

Get the smallest one.  Even an unrealistic high performance scenario cannot
come close to using 32G.  I am sure you'll never reach even 4G usage.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] NFS performance?

2010-07-23 Thread Edward Ned Harvey
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Phil Harman
>
> Milkowski and Neil Perrin's zil synchronicity [PSARC/2010/108] changes
> with sync=disabled, when the changes work their way into an available
> 
> The fact that people run unsafe systems seemingly without complaint for
> years assumes that they know silent data corruption when they
> see^H^H^Hhear it ... which, of course, they didn't ... because it is
> silent ... or having encountered corrupted data, that they have the
> faintest idea where it came from. In my day to day work I still find
> many people that have been (apparently) very lucky.

Running with sync disabled, or ZIL disabled, you could call "unsafe" if you
want to use a generalization and a stereotype.  

Just like people say "writeback" is unsafe.  If you apply a little more
intelligence, you'll know, it's safe in some conditions, and not in other
conditions.  Like ... If you have a BBU, you can use your writeback safely.
And if you're not sharing stuff across the network, you're guaranteed the
disabled ZIL is safe.  But even when you are sharing stuff across the
network, the disabled ZIL can still be safe under the following conditions:

If you are only doing file sharing (NFS, CIFS) and you are willing to
reboot/remount from all your clients after an ungraceful shutdown of your
server, then it's safe to run with ZIL disabled.

If you're unsure, then adding SSD nonvolatile log device, as people have
said, is the way to go.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] NFS performance?

2010-07-23 Thread Darren J Moffat

On 23/07/2010 10:53, Sigbjorn Lie wrote:

The X25-V has up to 25k random read iops and up to 2.5k random write iops per 
second, so that
would seem okay for approx $80. :)

What about mirroring? Do I need mirrored ZIL devices in case of a power outage?


Note there is not a ZIL device, there is a slog device.  Every pool has 
one or more ZIL it may or may not have a slog device used to hold ZIL 
contents, wither a ZIL is on the slog or not depends on a lot of factors 
including the logbias property.


You don't need to mirror the slog device to protect against a power 
outage. You need to mirror the slog if you want to protect against 
loosing synchronous writes (but not pool consistency on disk) on power 
outage *and* failure of your slog device at the same time (ie a double 
fault).


--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] NFS performance?

2010-07-23 Thread Sigbjorn Lie

On Fri, July 23, 2010 11:21, Thomas Burgess wrote:
> On Fri, Jul 23, 2010 at 5:00 AM, Sigbjorn Lie  wrote:
>
>
>> I see I have already received several replies, thanks to all!
>>
>>
>> I would not like to risk losing any data, so I believe a ZIL device would
>> be the way for me. I see these exists in different prices. Any reason why I 
>> would not buy a cheap
>>  one? Like the Intel X25-V SSD 40GB 2,5"?
>>
>>
>> What size of ZIL device would be recommened for my pool consisting for 4 x
>> 1,5TB drives? Any
>> brands I should stay away from?
>>
>>
>>
>> Regards,
>> Sigbjorn
>>
>>
>> Like i said, i bought a 50 gb OCZ Vertex Limited Edition...it's like 200
>>
> dollars, up to 15,000 random iops (iops is what you want for fast zil)
>
>
> I've gotten excelent performance out of it.
>
>

The X25-V has up to 25k random read iops and up to 2.5k random write iops per 
second, so that
would seem okay for approx $80. :)

What about mirroring? Do I need mirrored ZIL devices in case of a power outage?


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] NFS performance?

2010-07-23 Thread Thomas Burgess
On Fri, Jul 23, 2010 at 5:00 AM, Sigbjorn Lie  wrote:

> I see I have already received several replies, thanks to all!
>
> I would not like to risk losing any data, so I believe a ZIL device would
> be the way for me. I see
> these exists in different prices. Any reason why I would not buy a cheap
> one? Like the Intel X25-V
> SSD 40GB 2,5"?
>
> What size of ZIL device would be recommened for my pool consisting for 4 x
> 1,5TB drives? Any
> brands I should stay away from?
>
>
>
> Regards,
> Sigbjorn
>
> Like i said, i bought a 50 gb OCZ Vertex Limited Edition...it's like 200
dollars, up to 15,000 random iops (iops is what you want for fast zil)


I've gotten excelent performance out of it.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] NFS performance?

2010-07-23 Thread Phil Harman

On 23/07/2010 10:02, Sigbjorn Lie wrote:

On Fri, July 23, 2010 10:42, tomwaters wrote:
   

I agree, I get apalling NFS speeds compared to CIFS/Samba..ie. CIFS/Samba of 
95-105MB and NFS of
5-20MB.


Not the thread hijack, but I assume a SSD ZIL will similarly improve an iSCSI 
target...as I am
getting 2-5MB on that too. --
This message posted from opensolaris.org
 

This is exactly the numbers I'm getting as well.

What's the reason for such low rate when using iSCSI?
   


The filesystem or application using the iSCSI target may be requesting 
regular cache flushes. These will require synchronous writes to disk. An 
SSD doesn't remove the sync writes, it just makes them a lot faster. 
Other sensible storage servers typically use NVRAM caches to solve this 
problem. Others just play fast and loose with your data.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] NFS performance?

2010-07-23 Thread Phil Harman


Sent from my iPhone

On 23 Jul 2010, at 09:42, tomwaters  wrote:

> I agree, I get apalling NFS speeds compared to CIFS/Samba..ie. CIFS/Samba of 
> 95-105MB and NFS of 5-20MB.
> 
> Not the thread hijack, but I assume a SSD ZIL will similarly improve an iSCSI 
> target...as I am getting 2-5MB on that too.

Yes, it generally will. I've seen some huge improvements with iSCSI, but YMMV 
depending on your config, application and workload.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] NFS performance?

2010-07-23 Thread Sigbjorn Lie
On Fri, July 23, 2010 10:42, tomwaters wrote:
> I agree, I get apalling NFS speeds compared to CIFS/Samba..ie. CIFS/Samba of 
> 95-105MB and NFS of
> 5-20MB.
>
>
> Not the thread hijack, but I assume a SSD ZIL will similarly improve an iSCSI 
> target...as I am
> getting 2-5MB on that too. --
> This message posted from opensolaris.org


This is exactly the numbers I'm getting as well.

What's the reason for such low rate when using iSCSI?




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] NFS performance?

2010-07-23 Thread Sigbjorn Lie
I see I have already received several replies, thanks to all!

I would not like to risk losing any data, so I believe a ZIL device would be 
the way for me. I see
these exists in different prices. Any reason why I would not buy a cheap one? 
Like the Intel X25-V
SSD 40GB 2,5"?

What size of ZIL device would be recommened for my pool consisting for 4 x 
1,5TB drives? Any
brands I should stay away from?



Regards,
Sigbjorn





On Fri, July 23, 2010 09:48, Phil Harman wrote:
> That's because NFS adds synchronous writes to the mix (e.g. the client needs 
> to know certain
> transactions made it to nonvolatile storage in case the server restarts etc). 
> The simplest safe
> solution, although not cheap, is to add an SSD log device to the pool.
>
> On 23 Jul 2010, at 08:11, "Sigbjorn Lie"  wrote:
>
>
>> Hi,
>>
>>
>> I've been searching around on the Internet to fine some help with this, but 
>> have been
>> unsuccessfull so far.
>>
>> I have some performance issues with my file server. I have an OpenSolaris 
>> server with a Pentium
>> D
>> 3GHz CPU, 4GB of memory, and a RAIDZ1 over 4 x Seagate (ST31500341AS) 1,5TB 
>> SATA drives.
>>
>>
>> If I compile or even just unpack a tar.gz archive with source code (or any 
>> archive with lots of
>>  small files), on my Linux client onto a NFS mounted disk to the OpenSolaris 
>> server, it's
>> extremely slow compared to unpacking this archive on the locally on the 
>> server. A 22MB .tar.gz
>> file containng 7360 files takes 9 minutes and 12seconds to unpack over NFS.
>>
>> Unpacking the same file locally on the server is just under 2 seconds. 
>> Between the server and
>> client I have a gigabit network, which at the time of testing had no other 
>> significant load. My
>> NFS mount options are: "rw,hard,intr,nfsvers=3,tcp,sec=sys".
>>
>>
>> Any suggestions to why this is?
>>



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] NFS performance?

2010-07-23 Thread Phil Harman
On 23 Jul 2010, at 09:18, Andrew Gabriel  wrote:

> Thomas Burgess wrote:
>> 
>> On Fri, Jul 23, 2010 at 3:11 AM, Sigbjorn Lie > > wrote:
>> 
>>Hi,
>> 
>>I've been searching around on the Internet to fine some help with
>>this, but have been
>>unsuccessfull so far.
>> 
>>I have some performance issues with my file server. I have an
>>OpenSolaris server with a Pentium D
>>3GHz CPU, 4GB of memory, and a RAIDZ1 over 4 x Seagate
>>(ST31500341AS) 1,5TB SATA drives.
>> 
>>If I compile or even just unpack a tar.gz archive with source code
>>(or any archive with lots of
>>small files), on my Linux client onto a NFS mounted disk to the
>>OpenSolaris server, it's extremely
>>slow compared to unpacking this archive on the locally on the
>>server. A 22MB .tar.gz file
>>containng 7360 files takes 9 minutes and 12seconds to unpack over NFS.
>> 
>>Unpacking the same file locally on the server is just under 2
>>seconds. Between the server and
>>client I have a gigabit network, which at the time of testing had
>>no other significant load. My
>>NFS mount options are: "rw,hard,intr,nfsvers=3,tcp,sec=sys".
>> 
>>Any suggestions to why this is?
>> 
>> 
>>Regards,
>>Sigbjorn
>> 
>> 
>> as someone else said, adding an ssd log device can help hugely.  I saw about 
>> a 500% nfs write increase by doing this.
>> I've heard of people getting even more.
> 
> Another option if you don't care quite so much about data security in the 
> event of an unexpected system outage would be to use Robert Milkowski and 
> Neil Perrin's zil synchronicity [PSARC/2010/108] changes with sync=disabled, 
> when the changes work their way into an available build. The risk is that if 
> the file server goes down unexpectedly, it might come back up having lost 
> some seconds worth of changes which it told the client (lied) that it had 
> committed to disk, when it hadn't, and this violates the NFS protocol. That 
> might be OK if you are using it to hold source that's being built, where you 
> can kick off a build again if the server did go down in the middle of it. 
> Wouldn't be a good idea for some other applications though (although Linux 
> ran this way for many years, seemingly without many complaints). Note that 
> there's no increased risk of the zpool going bad - it's just that after the 
> reboot, filesystems with sync=disabled will look like they were rewound by 
> some secon
 ds (possibly up to 30 seconds).

That's assuming you know it happened and that you need  to restart the build 
(ideally with a make clean). All the NFS client knows is that the NFS server 
went away for some time. It still assumes nothing was lost. I can imagine cases 
where the build might continue to completion but with partially corrupted 
files. It's unlikely, but conceivable. Of course, databases like dbm, MySQL or 
Oracle would go blithely on up the swanee with silent data corruption.

The fact that people run unsafe systems seemingly without complaint for years 
assumes that they know silent data corruption when they see^H^H^Hhear it ... 
which, of course, they didn't ... because it is silent ... or having 
encountered corrupted data, that they have the faintest idea where it came 
from. In my day to day work I still find many people that have been 
(apparently) very lucky.

Feel free to play fast and loose with your own data, but I won't with mine, 
thanks! ;)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] NFS performance?

2010-07-23 Thread tomwaters
I agree, I get apalling NFS speeds compared to CIFS/Samba..ie. CIFS/Samba of 
95-105MB and NFS of 5-20MB.

Not the thread hijack, but I assume a SSD ZIL will similarly improve an iSCSI 
target...as I am getting 2-5MB on that too.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] NFS performance?

2010-07-23 Thread Andrew Gabriel

Thomas Burgess wrote:



On Fri, Jul 23, 2010 at 3:11 AM, Sigbjorn Lie > wrote:


Hi,

I've been searching around on the Internet to fine some help with
this, but have been
unsuccessfull so far.

I have some performance issues with my file server. I have an
OpenSolaris server with a Pentium D
3GHz CPU, 4GB of memory, and a RAIDZ1 over 4 x Seagate
(ST31500341AS) 1,5TB SATA drives.

If I compile or even just unpack a tar.gz archive with source code
(or any archive with lots of
small files), on my Linux client onto a NFS mounted disk to the
OpenSolaris server, it's extremely
slow compared to unpacking this archive on the locally on the
server. A 22MB .tar.gz file
containng 7360 files takes 9 minutes and 12seconds to unpack over NFS.

Unpacking the same file locally on the server is just under 2
seconds. Between the server and
client I have a gigabit network, which at the time of testing had
no other significant load. My
NFS mount options are: "rw,hard,intr,nfsvers=3,tcp,sec=sys".

Any suggestions to why this is?


Regards,
Sigbjorn


as someone else said, adding an ssd log device can help hugely.  I saw 
about a 500% nfs write increase by doing this.

I've heard of people getting even more.


Another option if you don't care quite so much about data security in 
the event of an unexpected system outage would be to use Robert 
Milkowski and Neil Perrin's zil synchronicity [PSARC/2010/108] changes 
with sync=disabled, when the changes work their way into an available 
build. The risk is that if the file server goes down unexpectedly, it 
might come back up having lost some seconds worth of changes which it 
told the client (lied) that it had committed to disk, when it hadn't, 
and this violates the NFS protocol. That might be OK if you are using it 
to hold source that's being built, where you can kick off a build again 
if the server did go down in the middle of it. Wouldn't be a good idea 
for some other applications though (although Linux ran this way for many 
years, seemingly without many complaints). Note that there's no 
increased risk of the zpool going bad - it's just that after the reboot, 
filesystems with sync=disabled will look like they were rewound by some 
seconds (possibly up to 30 seconds).


--
Andrew Gabriel
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] NFS performance?

2010-07-23 Thread Thomas Burgess
On Fri, Jul 23, 2010 at 3:11 AM, Sigbjorn Lie  wrote:

> Hi,
>
> I've been searching around on the Internet to fine some help with this, but
> have been
> unsuccessfull so far.
>
> I have some performance issues with my file server. I have an OpenSolaris
> server with a Pentium D
> 3GHz CPU, 4GB of memory, and a RAIDZ1 over 4 x Seagate (ST31500341AS) 1,5TB
> SATA drives.
>
> If I compile or even just unpack a tar.gz archive with source code (or any
> archive with lots of
> small files), on my Linux client onto a NFS mounted disk to the OpenSolaris
> server, it's extremely
> slow compared to unpacking this archive on the locally on the server. A
> 22MB .tar.gz file
> containng 7360 files takes 9 minutes and 12seconds to unpack over NFS.
>
> Unpacking the same file locally on the server is just under 2 seconds.
> Between the server and
> client I have a gigabit network, which at the time of testing had no other
> significant load. My
> NFS mount options are: "rw,hard,intr,nfsvers=3,tcp,sec=sys".
>
> Any suggestions to why this is?
>
>
> Regards,
> Sigbjorn
>
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>



as someone else said, adding an ssd log device can help hugely.  I saw about
a 500% nfs write increase by doing this.
I've heard of people getting even more.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] NFS performance?

2010-07-23 Thread Phil Harman
That's because NFS adds synchronous writes to the mix (e.g. the client needs to 
know certain transactions made it to nonvolatile storage in case the server 
restarts etc). The simplest safe solution, although not cheap, is to add an SSD 
log device to the pool.

On 23 Jul 2010, at 08:11, "Sigbjorn Lie"  wrote:

> Hi,
> 
> I've been searching around on the Internet to fine some help with this, but 
> have been
> unsuccessfull so far.
> 
> I have some performance issues with my file server. I have an OpenSolaris 
> server with a Pentium D
> 3GHz CPU, 4GB of memory, and a RAIDZ1 over 4 x Seagate (ST31500341AS) 1,5TB 
> SATA drives.
> 
> If I compile or even just unpack a tar.gz archive with source code (or any 
> archive with lots of
> small files), on my Linux client onto a NFS mounted disk to the OpenSolaris 
> server, it's extremely
> slow compared to unpacking this archive on the locally on the server. A 22MB 
> .tar.gz file
> containng 7360 files takes 9 minutes and 12seconds to unpack over NFS.
> 
> Unpacking the same file locally on the server is just under 2 seconds. 
> Between the server and
> client I have a gigabit network, which at the time of testing had no other 
> significant load. My
> NFS mount options are: "rw,hard,intr,nfsvers=3,tcp,sec=sys".
> 
> Any suggestions to why this is?
> 
> 
> Regards,
> Sigbjorn
> 
> 
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] NFS performance?

2010-07-23 Thread Sigbjorn Lie
Hi,

I've been searching around on the Internet to fine some help with this, but 
have been
unsuccessfull so far.

I have some performance issues with my file server. I have an OpenSolaris 
server with a Pentium D
3GHz CPU, 4GB of memory, and a RAIDZ1 over 4 x Seagate (ST31500341AS) 1,5TB 
SATA drives.

If I compile or even just unpack a tar.gz archive with source code (or any 
archive with lots of
small files), on my Linux client onto a NFS mounted disk to the OpenSolaris 
server, it's extremely
slow compared to unpacking this archive on the locally on the server. A 22MB 
.tar.gz file
containng 7360 files takes 9 minutes and 12seconds to unpack over NFS.

Unpacking the same file locally on the server is just under 2 seconds. Between 
the server and
client I have a gigabit network, which at the time of testing had no other 
significant load. My
NFS mount options are: "rw,hard,intr,nfsvers=3,tcp,sec=sys".

Any suggestions to why this is?


Regards,
Sigbjorn


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] NFS performance on ZFS vs UFS

2008-01-31 Thread Jesus Cea
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Tomas Ögren wrote:
| To get similar (lower) consistency guarantees, try disabling ZIL..
| google://zil_disable .. This should up the speed, but might cause disk
| corruption if the server crashes while a client is writing data.. (just
| like with UFS)

No disk corruption. Only dataloss (last writes can be lost), if I recall
correctly. ZFS will be consistent even with ZIL disabled.

If I'm wrong, please educate :)


- --
Jesus Cea Avion _/_/  _/_/_/_/_/_/
[EMAIL PROTECTED] http://www.argo.es/~jcea/ _/_/_/_/  _/_/_/_/  _/_/
jabber / xmpp:[EMAIL PROTECTED] _/_/_/_/  _/_/_/_/_/
~   _/_/  _/_/_/_/  _/_/  _/_/
"Things are not so easy"  _/_/  _/_/_/_/  _/_/_/_/  _/_/
"My name is Dump, Core Dump"   _/_/_/_/_/_/  _/_/  _/_/
"El amor es poner tu felicidad en la felicidad de otro" - Leibniz
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iQCVAwUBR6IT3plgi5GaxT1NAQJnZAP9FgFMMF7HVM5S2pNg03Csir+SnctfO7Jj
3ei5RtXbGLryAvZHrSAdZMYs4tITL+5F50f9Wc9iLmutTeo8fgHf/EW24kNxGPQJ
UocPLmb2rQRANcaZu1JY8LR3Fv3xx2tRxvfnMkrGL7yw7/UOvYeD2w8evTHa2ZVc
B0YSLXOcuB8=
=kqoy
-END PGP SIGNATURE-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] NFS performance on ZFS vs UFS

2008-01-27 Thread Guanghui Wang
I also test the nfs with 'zfs set sharenfs=on' performance with a linux client.
By echo zil_disable/W0t1 | mdb -kw  the small files from nfs speed up 10x.

about zil_disable,see Eric Kustarz's blog:
http://blogs.sun.com/erickustarz/entry/zil_disable
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] NFS performance on ZFS vs UFS

2008-01-25 Thread Joerg Schilling
Torrey McMahon <[EMAIL PROTECTED]> wrote:


> http://www.philohome.com/hammerhead/broken-disk.jpg :-)

Be careful, things like this can result in "device corruption"!

Jörg

-- 
 EMail:[EMAIL PROTECTED] (home) Jörg Schilling D-13353 Berlin
   [EMAIL PROTECTED](uni)  
   [EMAIL PROTECTED] (work) Blog: http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] NFS performance on ZFS vs UFS

2008-01-25 Thread Torrey McMahon
Robert Milkowski wrote:
> Hello Darren,
>
>
>
> DJM> BTW there isn't really any such think as "disk corruption" there is 
> DJM> "data corruption" :-)
>
> Well, if you scratch it hard enough :)
>   

http://www.philohome.com/hammerhead/broken-disk.jpg :-)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] NFS performance on ZFS vs UFS

2008-01-25 Thread Robert Milkowski
Hello Darren,



DJM> BTW there isn't really any such think as "disk corruption" there is 
DJM> "data corruption" :-)

Well, if you scratch it hard enough :)




-- 
Best regards,
 Robert Milkowski   mailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] NFS performance on ZFS vs UFS

2008-01-25 Thread Darren J Moffat
Tomas Ögren wrote:
> On 24 January, 2008 - Steve Hillman sent me these 1,9K bytes:
> 
>> I realize that this topic has been fairly well beaten to death on this 
>> forum, but I've also read numerous comments from ZFS developers that they'd 
>> like to hear about significantly different performance numbers of ZFS vs UFS 
>> for NFS-exported filesystems, so here's one more.
>>
>> The server is an x4500 with 44 drives configured in a RAID10 zpool, and two 
>> drives mirrored and formatted with UFS for the boot device. It's running 
>> Solaris 10u4, patched with the Recommended Patch Set from late Dec/07. The 
>> client (if it matters) is an older V20z w/ Solaris 10 3/05. No tuning has 
>> been done on either box
>>
>> The test involved copying lots of small files (2-10k) from an NFS client to 
>> a mounted NFS volume. A simple 'cp' was done, both with 1 thread and 4 
>> parallel threads (to different directories) and then I monitored to see how 
>> fast the files were accumulating on the server.
>>
>> ZFS:
>> 1 thread - 25 files/second; 4 threads - 25 files/second (~6 per thread)
>>
>> UFS: (same server, just exported /var from the boot volume)
>> 1 thread - 200 files/second; 4 threads - 520 files/second (~130/thread)
> 
> To get similar (lower) consistency guarantees, try disabling ZIL..
> google://zil_disable .. This should up the speed, but might cause disk
> corruption if the server crashes while a client is writing data.. (just
> like with UFS)

Disabling the ZIL does NOT cause disk corruption.  It doesn't even cause 
  ZFS to be inconsistent on disk.  What it does to is mean that you 
onlonger have guaranteed synchronous write semantics - ie on crash an 
application might have done a synch write that never made it to stable 
storage.

BTW there isn't really any such think as "disk corruption" there is 
"data corruption" :-)

-- 
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] NFS performance on ZFS vs UFS

2008-01-24 Thread Tomas Ögren
On 24 January, 2008 - Steve Hillman sent me these 1,9K bytes:

> I realize that this topic has been fairly well beaten to death on this forum, 
> but I've also read numerous comments from ZFS developers that they'd like to 
> hear about significantly different performance numbers of ZFS vs UFS for 
> NFS-exported filesystems, so here's one more.
> 
> The server is an x4500 with 44 drives configured in a RAID10 zpool, and two 
> drives mirrored and formatted with UFS for the boot device. It's running 
> Solaris 10u4, patched with the Recommended Patch Set from late Dec/07. The 
> client (if it matters) is an older V20z w/ Solaris 10 3/05. No tuning has 
> been done on either box
> 
> The test involved copying lots of small files (2-10k) from an NFS client to a 
> mounted NFS volume. A simple 'cp' was done, both with 1 thread and 4 parallel 
> threads (to different directories) and then I monitored to see how fast the 
> files were accumulating on the server.
> 
> ZFS:
> 1 thread - 25 files/second; 4 threads - 25 files/second (~6 per thread)
> 
> UFS: (same server, just exported /var from the boot volume)
> 1 thread - 200 files/second; 4 threads - 520 files/second (~130/thread)

To get similar (lower) consistency guarantees, try disabling ZIL..
google://zil_disable .. This should up the speed, but might cause disk
corruption if the server crashes while a client is writing data.. (just
like with UFS)

/Tomas
-- 
Tomas Ögren, [EMAIL PROTECTED], http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] NFS performance on ZFS vs UFS

2008-01-24 Thread Neil Perrin


Steve Hillman wrote:
> I realize that this topic has been fairly well beaten to death on this forum, 
> but I've also read numerous comments from ZFS developers that they'd like to 
> hear about significantly different performance numbers of ZFS vs UFS for 
> NFS-exported filesystems, so here's one more.
> 
> The server is an x4500 with 44 drives configured in a RAID10 zpool, and two 
> drives mirrored and formatted with UFS for the boot device. It's running 
> Solaris 10u4, patched with the Recommended Patch Set from late Dec/07. The 
> client (if it matters) is an older V20z w/ Solaris 10 3/05. No tuning has 
> been done on either box
> 
> The test involved copying lots of small files (2-10k) from an NFS client to a 
> mounted NFS volume. A simple 'cp' was done, both with 1 thread and 4 parallel 
> threads (to different directories) and then I monitored to see how fast the 
> files were accumulating on the server.
> 
> ZFS:
> 1 thread - 25 files/second; 4 threads - 25 files/second (~6 per thread)
> 
> UFS: (same server, just exported /var from the boot volume)
> 1 thread - 200 files/second; 4 threads - 520 files/second (~130/thread)

With this big a difference, I suspect the write cache is enabled on 
the disks. UFS requires this cache to be disabled or battery backed
otherwise corruption can occur.

> 
> For comparison, the same test was done to a NetApp FAS270 that the x4500 was 
> bought to replace:
> 1 thread - 70 files/second; 4 threads - ~250 files/second

I don't know enough about that system but perhaps it has NVRAM or an SSD
to service the synchronous demands of NFS. An equivalent setup could be
configured with a separate intent log on a similar fast device.

> 
> I have been able to work around this performance hole by exporting multiple 
> ZFS filesystems, because the workload is spread across a hashed directory 
> structure. I then get 25 files per FS per second. Still, I thought I'd raise 
> it here anyway. If there's something I'm doing wrong, I'd love to hear about 
> it. 
> 
> I'm also assuming that this ties into BugID 6535160  "Lock contention on 
> zl_lock from zil_commit", so if that's the case, please add another vote for 
> making this fix available as a patch for S10u4 users

I believe this is a different problem than 6535160.

> 
> Thanks,
> Steve Hillman
>  
>  
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] NFS performance on ZFS vs UFS

2008-01-24 Thread Steve Hillman
I realize that this topic has been fairly well beaten to death on this forum, 
but I've also read numerous comments from ZFS developers that they'd like to 
hear about significantly different performance numbers of ZFS vs UFS for 
NFS-exported filesystems, so here's one more.

The server is an x4500 with 44 drives configured in a RAID10 zpool, and two 
drives mirrored and formatted with UFS for the boot device. It's running 
Solaris 10u4, patched with the Recommended Patch Set from late Dec/07. The 
client (if it matters) is an older V20z w/ Solaris 10 3/05. No tuning has been 
done on either box

The test involved copying lots of small files (2-10k) from an NFS client to a 
mounted NFS volume. A simple 'cp' was done, both with 1 thread and 4 parallel 
threads (to different directories) and then I monitored to see how fast the 
files were accumulating on the server.

ZFS:
1 thread - 25 files/second; 4 threads - 25 files/second (~6 per thread)

UFS: (same server, just exported /var from the boot volume)
1 thread - 200 files/second; 4 threads - 520 files/second (~130/thread)

For comparison, the same test was done to a NetApp FAS270 that the x4500 was 
bought to replace:
1 thread - 70 files/second; 4 threads - ~250 files/second

I have been able to work around this performance hole by exporting multiple ZFS 
filesystems, because the workload is spread across a hashed directory 
structure. I then get 25 files per FS per second. Still, I thought I'd raise it 
here anyway. If there's something I'm doing wrong, I'd love to hear about it. 

I'm also assuming that this ties into BugID 6535160  "Lock contention on 
zl_lock from zil_commit", so if that's the case, please add another vote for 
making this fix available as a patch for S10u4 users

Thanks,
Steve Hillman
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] NFS performance considerations (Linux vs Solaris)

2007-12-10 Thread msl
Ok, i have proposed, so, i'm trying to implement it. :)
 I hope you can (at least) criticizing it. :))
 The document is here: http://www.posix.brte.com.br/blog/?p=89
 It is not complete, i'm running some tests yet, and analyzing the results. But 
i think you can look and contribute with tome thoughts already.
  It was nice to see the write performance for the iSCSI protocol and the 
NFSv3. Why iSCSI was "much" better? Why the read performance was the "same"?  
All guarantees that i have with NFS i have with iSCSI?
 Please, comment it!

 Thanks a lot for your time!
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] NFS performance considerations (Linux vs Solaris)

2007-11-20 Thread msl
Hello all...
 I think all of you agree that "performance" is a great topic in NFS. 
 So, when we talk about NFS and ZFS we imagine a great combination/solution. 
But one is not dependent on another, actually are two well distinct 
technologies. ZFS has a lot of features that all we know about, and "maybe", 
all of us want in a NFS share (maybe not). The point is: Two technologies with 
diferent priorities.
 So, what i think is important, is a "document" (here on NFS/ZFS discuss), that 
lists and explains the ZFS features that have a "real" performance impact. I 
know that there is the solarisinternals wiki about ZFS/NFS integration, but 
what i think is really important is a comparison between Linux and Solaris/ZFS 
on server side.
 That would be very useful to see for example, what "consistency" i have with 
Linux and (XFS, ext3, etc), with "that" performance. And "how" can i configure 
a similar NFS service on solaris/ZFS. 
 Here we have some information about it: 
http://blogs.sun.com/roch/entry/nfs_and_zfs_a_fine
 but there is no relation with Linux, what i think is important.
 What i do mean, is that the people that knows a lot about the NFS protocol, 
and about the filesystem features, should make such comparison (to facilitate 
the adoption and users' comparison). I think there are many users comparing 
oranges with apples.
 Another example (correct me if i am wrong), Until the kernel 2.4.20 (at 
least), the default export option for sync/async was "async" (in solaris i 
think always was "sync"). Another point was about the "commit" operation in 
vers2, that was not implemented, the server just reply with an "OK", but the 
data was not in stable storage yet (here the ZIL and the roch blog entry is 
excellent).
 That's it, i'm proposing the creation of a "matrix/table" with features and 
performance impact, as well as a comparison with other 
implementations/implications.
 Thanks very much for your time, and sorry for the long post.

 Leal.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss