Re: [zfs-discuss] Disabling COMMIT at NFS level, or disabling ZIL on a per-filesystem basis

2008-10-25 Thread Roch Bourbonnais

Le 23 oct. 08 à 05:40, Constantin Gonzalez a écrit :

> Hi,
>
> Bob Friesenhahn wrote:
>> On Wed, 22 Oct 2008, Neil Perrin wrote:
>>> On 10/22/08 10:26, Constantin Gonzalez wrote:
 3. Disable ZIL[1]. This is of course evil, but one customer  
 pointed out to me
that if a tar xvf were writing locally to a ZFS file system,  
 the writes
wouldn't be synchronous either, so there's no point in forcing  
 NFS users
to having a better availability experience at the expense of  
 performance.
>>
>> The conclusion reached here is quite seriously wrong and no Sun
>> employee should suggest it to a customer.  If the system writing to a
>
> I'm not suggesting it to any customer. Actually, I argued quite a  
> long time
> with the customer, trying to convince him that "slow but correct" is  
> better.
>
> The conclusion above is a conscious decision by the customer. He  
> says that he
> does not want NFS to turn any write into a synchronous write, he's  
> happy if
> all writes are asynchronous, because in this case the NFS server is  
> a backup to
> disk device and if power fails he simply restarts the backup 'cause  
> he has the
> data in multiple copies anyway.
>

The case of a full backup (but not incremental) where an operator  is  
monitoring that the server stays up
for the full duration (or does the manual restart of the operation)  
seems like a singular case where this might make half sense.

But as was stated, for performance which is the goal here, better use  
a bulk type transfer of data through some specific protocol
(as opposed to NFS small file manipulations). What this creates is  
that  failure of the server has immediate obvious repercusion on the  
client,
and things can be restarted without further coordination.

I understand also that with NFS directory delegation or Exclusive  
mount points one could solve this NFS peculiarity
(which is totally unrelated to ZFS, and not to be confused with the  
ZFS / SAN storage cache flush condition).


If CIFS is not subject to the same penalty, I can only assume that the  
integrity of the client's view cannot be guaranteed after a server  
crash.
Anyone knows this for sure ?
-r


>> local filesystem reboots then the applications which were running are
>> also lost and will see the new filesystem state when they are
>> restarted.  If an NFS server sponteneously reboots, the applications
>> on the many clients are still running and the client systems are  
>> using
>> cached data.  This means that clients could do very bad things if the
>> filesystem state (as seen by NFS) is suddenly not consistent.  One of
>> the joys of NFS is that the client continues unhindered once the
>> server returns.
>
> Yes, we're both aware of this. In this particular situation, the  
> customer
> would restart his backup job (and thus the client application) in  
> case the
> server dies.
>
> Thanks for pointing out the difference, this is indeed an important  
> distinction.
>
> Cheers,
>   Constantin
>
> -- 
> Constantin Gonzalez  Sun Microsystems  
> GmbH, Germany
> Principal Field Technologist
> http://blogs.sun.com/constantin
> Tel.: +49 89/4 60 08-25 91   
> http://google.com/search?q=constantin+gonzalez
>
> Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim- 
> Heimstetten
> Amtsgericht Muenchen: HRB 161028
> Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland  
> Boemer
> Vorsitzender des Aufsichtsrates: Martin Haering
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Disabling COMMIT at NFS level, or disabling ZIL on a per-filesystem basis

2008-10-23 Thread Ross Smith
No problem.  I didn't use mirrored slogs myself, but that's certainly
a step up for reliability.

It's pretty easy to create a boot script to re-create the ramdisk and
re-attach it to the pool too.  So long as you use the same device name
for the ramdisk you can add it each time with a simple "zpool replace
pool ramdisk"


On Thu, Oct 23, 2008 at 1:56 PM, Constantin Gonzalez
<[EMAIL PROTECTED]> wrote:
> Hi,
>
> yes, using slogs is the best solution.
>
> Meanwhile, using mirrored slogs from other servers' RAM-Disks running on
> UPSs
> seem like an interesting idea, if the reliability of UPS-backed RAM is
> deemed
> reliable enough for the purposes of the NFS server.
>
> Thanks for siggesting this!
>
> Cheers,
>   Constantin
>
> Ross wrote:
>>
>> Well, it might be even more of a bodge than disabling the ZIL, but how
>> about:
>>
>> - Create a 512MB ramdisk, use that for the ZIL
>> - Buy a Micro Memory nvram PCI card for £100 or so.
>> - Wait 3-6 months, hopefully buy a fully supported PCI-e SSD to replace
>> the Micro Memory card.
>>
>> The ramdisk isn't an ideal solution, but provided you don't export the
>> pool with it offline, it does work.  We used it as a stop gap solution for a
>> couple of weeks while waiting for a Micro Memory nvram card.
>>
>> Our reasoning was that our server's on a UPS and we figured if something
>> crashed badly enough to take out something like the UPS, the motherboard,
>> etc, we'd be loosing data anyway.  We just made sure we had good backups in
>> case the pool got corrupted and crossed our fingers.
>>
>> The reason I say wait 3-6 months is that there's a huge amount of activity
>> with SSD's at the moment.  Sun said that they were planning to have flash
>> storage launched by Christmas, so I figure there's a fair chance that we'll
>> see some supported PCIe cards by next Spring.
>> --
>> This message posted from opensolaris.org
>> ___
>> zfs-discuss mailing list
>> zfs-discuss@opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
> --
> Constantin Gonzalez  Sun Microsystems GmbH,
> Germany
> Principal Field Technologist
>  http://blogs.sun.com/constantin
> Tel.: +49 89/4 60 08-25 91
> http://google.com/search?q=constantin+gonzalez
>
> Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551
> Kirchheim-Heimstetten
> Amtsgericht Muenchen: HRB 161028
> Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland Boemer
> Vorsitzender des Aufsichtsrates: Martin Haering
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Disabling COMMIT at NFS level, or disabling ZIL on a per-filesystem basis

2008-10-23 Thread Bob Friesenhahn
On Thu, 23 Oct 2008, Constantin Gonzalez wrote:
>
> This is what the customer told me. He uses rsync and he is ok with restarting
> the rsync whenever the NFS server restarts.

Then remind your customer to tell rsync to inspect the data rather 
than trusting time stamps.  Rsync will then run quite a bit slower but 
at least it will catch a corrupted file.  There is still the problem 
that the client OS may have cached data which it thinks is correct but 
no longer matches what is on the server.  This may result in rsync 
making wrong decisions.

A better approach is to run rsync on the server so that there is rsync 
to rsync communication rather than rsync to NFS.  This can result in 
far better performance and without the NFS sychronous write problem.

For my own backups, I initiate rsync on the server side and have a 
special secure rsync service set up on the clients so that the server 
sucks files from the clients.  This works very well and helps with 
administration because any error conditions will be noted in just one 
place.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Disabling COMMIT at NFS level, or disabling ZIL on a per-filesystem basis

2008-10-23 Thread Constantin Gonzalez
Hi,

Bob Friesenhahn wrote:
> On Thu, 23 Oct 2008, Constantin Gonzalez wrote:
>>
>> Yes, we're both aware of this. In this particular situation, the customer
>> would restart his backup job (and thus the client application) in case 
>> the
>> server dies.
> 
> So it is ok for this customer if their backup becomes silently corrupted 
> and the backup software continues running?  Consider that some of the 
> backup files may have missing or corrupted data in the middle.  Your 
> customer is quite dedicated in that he will monitor the situation very 
> well and remember to reboot the backup system, correct any corrupted 
> files, and restart the backup software whenever the server panics and 
> reboots.

This is what the customer told me. He uses rsync and he is ok with restarting
the rsync whenever the NFS server restarts.

> A properly built server should be able to handle NFS writes at gigabit 
> wire-speed.

I'm advocating for a properly built system, believe me :).

Cheers,
Constantin

-- 
Constantin Gonzalez  Sun Microsystems GmbH, Germany
Principal Field Technologisthttp://blogs.sun.com/constantin
Tel.: +49 89/4 60 08-25 91   http://google.com/search?q=constantin+gonzalez

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Disabling COMMIT at NFS level, or disabling ZIL on a per-filesystem basis

2008-10-23 Thread Bob Friesenhahn
On Thu, 23 Oct 2008, Constantin Gonzalez wrote:
>
> Yes, we're both aware of this. In this particular situation, the customer
> would restart his backup job (and thus the client application) in case the
> server dies.

So it is ok for this customer if their backup becomes silently 
corrupted and the backup software continues running?  Consider that 
some of the backup files may have missing or corrupted data in the 
middle.  Your customer is quite dedicated in that he will monitor the 
situation very well and remember to reboot the backup system, correct 
any corrupted files, and restart the backup software whenever the 
server panics and reboots.

A properly built server should be able to handle NFS writes at 
gigabit wire-speed.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Disabling COMMIT at NFS level, or disabling ZIL on a per-filesystem basis

2008-10-23 Thread Constantin Gonzalez
Hi,

yes, using slogs is the best solution.

Meanwhile, using mirrored slogs from other servers' RAM-Disks running on UPSs
seem like an interesting idea, if the reliability of UPS-backed RAM is deemed
reliable enough for the purposes of the NFS server.

Thanks for siggesting this!

Cheers,
Constantin

Ross wrote:
> Well, it might be even more of a bodge than disabling the ZIL, but how about:
> 
> - Create a 512MB ramdisk, use that for the ZIL
> - Buy a Micro Memory nvram PCI card for £100 or so.
> - Wait 3-6 months, hopefully buy a fully supported PCI-e SSD to replace the 
> Micro Memory card.
> 
> The ramdisk isn't an ideal solution, but provided you don't export the pool 
> with it offline, it does work.  We used it as a stop gap solution for a 
> couple of weeks while waiting for a Micro Memory nvram card.
> 
> Our reasoning was that our server's on a UPS and we figured if something 
> crashed badly enough to take out something like the UPS, the motherboard, 
> etc, we'd be loosing data anyway.  We just made sure we had good backups in 
> case the pool got corrupted and crossed our fingers.
> 
> The reason I say wait 3-6 months is that there's a huge amount of activity 
> with SSD's at the moment.  Sun said that they were planning to have flash 
> storage launched by Christmas, so I figure there's a fair chance that we'll 
> see some supported PCIe cards by next Spring.
> --
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

-- 
Constantin Gonzalez  Sun Microsystems GmbH, Germany
Principal Field Technologisthttp://blogs.sun.com/constantin
Tel.: +49 89/4 60 08-25 91   http://google.com/search?q=constantin+gonzalez

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Disabling COMMIT at NFS level, or disabling ZIL on a per-filesystem basis

2008-10-23 Thread Constantin Gonzalez
Hi,

Bob Friesenhahn wrote:
> On Wed, 22 Oct 2008, Neil Perrin wrote:
>> On 10/22/08 10:26, Constantin Gonzalez wrote:
>>> 3. Disable ZIL[1]. This is of course evil, but one customer pointed out to 
>>> me
>>> that if a tar xvf were writing locally to a ZFS file system, the writes
>>> wouldn't be synchronous either, so there's no point in forcing NFS users
>>> to having a better availability experience at the expense of 
>>> performance.
> 
> The conclusion reached here is quite seriously wrong and no Sun 
> employee should suggest it to a customer.  If the system writing to a 

I'm not suggesting it to any customer. Actually, I argued quite a long time
with the customer, trying to convince him that "slow but correct" is better.

The conclusion above is a conscious decision by the customer. He says that he
does not want NFS to turn any write into a synchronous write, he's happy if
all writes are asynchronous, because in this case the NFS server is a backup to
disk device and if power fails he simply restarts the backup 'cause he has the
data in multiple copies anyway.

> local filesystem reboots then the applications which were running are 
> also lost and will see the new filesystem state when they are 
> restarted.  If an NFS server sponteneously reboots, the applications 
> on the many clients are still running and the client systems are using 
> cached data.  This means that clients could do very bad things if the 
> filesystem state (as seen by NFS) is suddenly not consistent.  One of 
> the joys of NFS is that the client continues unhindered once the 
> server returns.

Yes, we're both aware of this. In this particular situation, the customer
would restart his backup job (and thus the client application) in case the
server dies.

Thanks for pointing out the difference, this is indeed an important distinction.

Cheers,
   Constantin

-- 
Constantin Gonzalez  Sun Microsystems GmbH, Germany
Principal Field Technologisthttp://blogs.sun.com/constantin
Tel.: +49 89/4 60 08-25 91   http://google.com/search?q=constantin+gonzalez

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Disabling COMMIT at NFS level, or disabling ZIL on a per-filesystem basis

2008-10-23 Thread Constantin Gonzalez
Hi,

>> - The ZIL exists on a per filesystem basis in ZFS. Is there an RFE 
>> already
>>that asks for the ability to disable the ZIL on a per filesystem 
>> basis?
> 
> Yes: 6280630 zil synchronicity

good, thanks for the pointer!

> Though personally I've been unhappy with the exposure that zil_disable 
> has got.
> It was originally meant for debug purposes only. So providing an official
> way to make synchronous behaviour asynchronous is to me dangerous.

IMHO, the need here is to give admins control over the way they want their
file servers to behave. In this particular case, the admin argues that he knows
what he's doing, that he doesn't want his NFS server to behave more strongly
than a local filesystem and that he deserves control of that behaviour.

Ideally, there would be an NFS option that lets customers choose whether they
want to honor COMMIT requests or not.

Disabling ZIL on a per filesystem basis is only the second best solution, but
since that CR already exists, it seems to be the more realistic route.

Thanks,
Constantin


-- 
Constantin Gonzalez  Sun Microsystems GmbH, Germany
Principal Field Technologisthttp://blogs.sun.com/constantin
Tel.: +49 89/4 60 08-25 91   http://google.com/search?q=constantin+gonzalez

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Disabling COMMIT at NFS level, or disabling ZIL on a per-filesystem basis

2008-10-22 Thread Richard Elling
Ricardo M. Correia wrote:
> Hi Richard,
>
> On Qua, 2008-10-22 at 14:04 -0700, Richard Elling wrote:
>   
>> It is more important to use a separate disk, than to use a separate and fast
>> disk.  Anecdotal evidence suggests that using a USB hard disk works
>> well.
>> 
>
> While I don't necessarily disagree with your statement, please note that
> (as far as I'm aware) USB disks don't respect the "flush write cache"
> command, so in fact the disk may appear to be faster than it actually is
> because it's not maintaining proper transactional consistency.
>   

YMMV.  Some USB-to-SATA converters seem to have switches
to enable or disable the "optimized for quick removal" mode.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Disabling COMMIT at NFS level, or disabling ZIL on a per-filesystem basis

2008-10-22 Thread Ricardo M. Correia
Hi Richard,

On Qua, 2008-10-22 at 14:04 -0700, Richard Elling wrote:
> It is more important to use a separate disk, than to use a separate and fast
> disk.  Anecdotal evidence suggests that using a USB hard disk works
> well.

While I don't necessarily disagree with your statement, please note that
(as far as I'm aware) USB disks don't respect the "flush write cache"
command, so in fact the disk may appear to be faster than it actually is
because it's not maintaining proper transactional consistency.

Cheers,
Ricardo


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Disabling COMMIT at NFS level, or disabling ZIL on a per-filesystem basis

2008-10-22 Thread Nicolas Williams
On Wed, Oct 22, 2008 at 04:46:00PM -0400, Miles Nordin wrote:
> I thought NFSv2 -> NFSv3 was supposed to make this prestoserv, SSD,
> battery-backed DRAM stuff not needed for good performance any more.  I
> guess not though.

There are still a number of operations in NFSv3 and NFSv4 which the
client must wait for synchronously.  Things like file creation, fsync()
(duh) and therefore close().  This mostly negatively affects untarring
and restores.

Ideally applications like tar should be able to do asynchronous open()s
and close()s, but the OS doesn't provide those, so such apps would have
to use threads.  But in reality those apps are single-threaded and not
remotely asynchronous.

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Disabling COMMIT at NFS level, or disabling ZIL on a per-filesystem basis

2008-10-22 Thread Richard Elling
Constantin Gonzalez wrote:
> Hi,
>
> On a busy NFS server, performance tends to be very modest for large amounts
> of small files due to the well known effects of ZFS and ZIL honoring the
> NFS COMMIT operation[1].
>
> For the mature sysadmin who knows what (s)he does, there are three
> possibilities:
>
> 1. Live with it. Hard, if you see 10x less performance than could be and your
> users complain a lot.
>
> 2. Use a flash disk for a ZIL, a slog. Can add considerable extra cost,
> especially if you're using an X4500/X4540 and can't swap out fast SAS
> drives for cheap SATA drives to free the budget for flash ZIL drives.[2]
>   

It is more important to use a separate disk, than to use a separate and fast
disk.  Anecdotal evidence suggests that using a USB hard disk works
well.  Remember, slogs are a write-only workload and tend to use very
modest amounts of data -- you should see very few seeks on a dedicated
slog device.

Personally, I'd use a slice from the boot disk, because people tend
to leave tons of available space there.
 -- richard

> 3. Disable ZIL[1]. This is of course evil, but one customer pointed out to me
> that if a tar xvf were writing locally to a ZFS file system, the writes
> wouldn't be synchronous either, so there's no point in forcing NFS users
> to having a better availability experience at the expense of performance.
>
>
> So, if the sysadmin draws the informed and conscious conclusion that (s)he
> doesn't want to honor NFS COMMIT operations, what are options less disruptive
> than disabling ZIL completely?
>
> - I checked the NFS tunables from:
>http://dlc.sun.com/osol/docs/content/SOLTUNEPARAMREF/chapter3-1.html
>But could not find a tunable that would disable COMMIT honoring.
>Is there already an RFE asking for a share option that disable's the
>translation of COMMIT to synchronous writes?
>
> - The ZIL exists on a per filesystem basis in ZFS. Is there an RFE already
>that asks for the ability to disable the ZIL on a per filesystem basis?
>
>Once Admins start to disable the ZIL for whole pools because the extra
>performance is too tempting, wouldn't it be the lesser evil to let them
>disable it on a per filesystem basis?
>
> Comments?
>
>
> Cheers,
> Constantin
>
> [1]: http://blogs.sun.com/roch/entry/nfs_and_zfs_a_fine
> [2]: http://blogs.sun.com/perrin/entry/slog_blog_or_blogging_on
>
>   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Disabling COMMIT at NFS level, or disabling ZIL on a per-filesystem basis

2008-10-22 Thread Bob Friesenhahn
On Wed, 22 Oct 2008, Miles Nordin wrote:
>
> I thought NFSv2 -> NFSv3 was supposed to make this prestoserv, SSD,
> battery-backed DRAM stuff not needed for good performance any more.  I
> guess not though.

The intent was to allow the server to be able to buffer up more 
uncommitted data before the client system requested that it be 
committed to store.  In this case, if the server spontaneously 
rebooted, the client is responsible for remembering the uncomitted 
data that it already sent so that it can send it again.  This means 
that client behavior has quite a lot to do with perceived NFSv3 
performance.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Disabling COMMIT at NFS level, or disabling ZIL on a per-filesystem basis

2008-10-22 Thread Miles Nordin
> "cg" == Constantin Gonzalez <[EMAIL PROTECTED]> writes:

cg> if a tar xvf were writing locally to a ZFS file system, the
cg> writes wouldn't be synchronous either, so there's no point in
cg> forcing NFS users to having a better

It's worse for NFS because breaking the commit/lease/batch state
machine destroys the illusion of statelessness.  When you reboot the
server, you'll have to reboot all the clients to get them to behave
consistently again.

actually that is already my experince with NFSv4 diskless machines,
but way old versions of nevada I'm using are probably the culprit.

I thought NFSv2 -> NFSv3 was supposed to make this prestoserv, SSD,
battery-backed DRAM stuff not needed for good performance any more.  I
guess not though.


pgpLx6kwen4Zb.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Disabling COMMIT at NFS level, or disabling ZIL on a per-filesystem basis

2008-10-22 Thread Neil Perrin
On 10/22/08 13:56, Marcelo Leal wrote:
>>> But the slog is the ZIL. formaly a *separate*
>> intent log.
>>
>> No the slog is not the ZIL!
>  Ok, when you did write this:
>  "I've been slogging for a while on support for separate intent logs (slogs) 
> for ZFS.
>  Without slogs, the ZIL is allocated dynamically from the main pool".
> 
>  You were talking about "The body of code " in the statement: "the ZIL is 
> allocated "? 
>  So i have misunderstood you...
> 
>  Leal.

I guess I need to fix that!
Anyway the slog is not the ZIL it's one of the two
currently possible Intent Log types.

Sorry for the confusion: Neil.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Disabling COMMIT at NFS level, or disabling ZIL on a per-filesystem basis

2008-10-22 Thread Marcelo Leal
> > But the slog is the ZIL. formaly a *separate*
> intent log.
> 
> No the slog is not the ZIL!
 Ok, when you did write this:
 "I've been slogging for a while on support for separate intent logs (slogs) 
for ZFS. Without slogs, the ZIL is allocated dynamically from the main pool".

 You were talking about "The body of code " in the statement: "the ZIL is 
allocated "? 
 So i have misunderstood you...

 Leal.

> 
> Here's the definition of the terms as we've been
> trying to use them:
> 
> ZIL:
> The body of code the supports synchronous requests,
> , which writes
>   out to the Intent Logs
> Intent Log:
> A stable storage log. There is one per file system &
> & zvol.
> slog:
> An Intent Log on a separate stable device -
> - preferably high speed.
> 
> We don't really have name for an Intent Log when it's
> embedded in the main
> pool. I have in the past used the term clog for
> chained log. Originally before
> slogs existed, it was just the Intent Log.
> 
> Neil.
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discu
> ss
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Disabling COMMIT at NFS level, or disabling ZIL on a per-filesystem basis

2008-10-22 Thread Neil Perrin
> But the slog is the ZIL. formaly a *separate* intent log.

No the slog is not the ZIL!

Here's the definition of the terms as we've been trying to use them:

ZIL:
The body of code the supports synchronous requests, which writes
out to the Intent Logs
Intent Log:
A stable storage log. There is one per file system & zvol.
slog:
An Intent Log on a separate stable device - preferably high speed.

We don't really have name for an Intent Log when it's embedded in the main
pool. I have in the past used the term clog for chained log. Originally before
slogs existed, it was just the Intent Log.

Neil.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Disabling COMMIT at NFS level, or disabling ZIL on a per-filesystem basis

2008-10-22 Thread Bill Sommerfeld
On Wed, 2008-10-22 at 10:45 -0600, Neil Perrin wrote:
> Yes: 6280630 zil synchronicity
> 
> Though personally I've been unhappy with the exposure that zil_disable has 
> got.
> It was originally meant for debug purposes only. So providing an official
> way to make synchronous behaviour asynchronous is to me dangerous.

It seems far more dangerous to only provide a global knob instead of a
local knob.

I want it in conjunction with bulk operations (like an ON "nightly"
build, database reloads, etc.) where the response to a partial failure
will be to rm -rf and start over.  Any time spent waiting for
intermediate states of the filesystem to be committed to stable store is
wasted time.

> >Once Admins start to disable the ZIL for whole pools because the extra
> >performance is too tempting, wouldn't it be the lesser evil to let them
> >disable it on a per filesystem basis?

Agreed.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Disabling COMMIT at NFS level, or disabling ZIL on a per-filesystem basis

2008-10-22 Thread Marcelo Leal
> Bah, I've done it again.  I meant use it as a slog
> device, not as the ZIL...
But the slog is the ZIL. formaly a *separate* intent log. What´s the matter? I 
think everyone did understand. I think you did make a confusion some threads 
before about ZIL and L2ARC. That is a different thing.. ;-)

 Leal.
> 
> I'll get this terminology in my head eventually.
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Disabling COMMIT at NFS level, or disabling ZIL on a per-filesystem basis

2008-10-22 Thread Ross
Bah, I've done it again.  I meant use it as a slog device, not as the ZIL...

I'll get this terminology in my head eventually.
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Disabling COMMIT at NFS level, or disabling ZIL on a per-filesystem basis

2008-10-22 Thread Ross
Well, it might be even more of a bodge than disabling the ZIL, but how about:

- Create a 512MB ramdisk, use that for the ZIL
- Buy a Micro Memory nvram PCI card for £100 or so.
- Wait 3-6 months, hopefully buy a fully supported PCI-e SSD to replace the 
Micro Memory card.

The ramdisk isn't an ideal solution, but provided you don't export the pool 
with it offline, it does work.  We used it as a stop gap solution for a couple 
of weeks while waiting for a Micro Memory nvram card.

Our reasoning was that our server's on a UPS and we figured if something 
crashed badly enough to take out something like the UPS, the motherboard, etc, 
we'd be loosing data anyway.  We just made sure we had good backups in case the 
pool got corrupted and crossed our fingers.

The reason I say wait 3-6 months is that there's a huge amount of activity with 
SSD's at the moment.  Sun said that they were planning to have flash storage 
launched by Christmas, so I figure there's a fair chance that we'll see some 
supported PCIe cards by next Spring.
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Disabling COMMIT at NFS level, or disabling ZIL on a per-filesystem basis

2008-10-22 Thread Marcelo Leal
I agree with you Constantin that the sync is a performance problem, in the same 
way i think in a NFS environment it is just *required*. If the sync can be 
relaxed in a "specific NFS environment", my first opinion is that the NFS is 
not necessary on that environment in first place. 
 IMHO a protocol like iSCSI would have a much better performance in such 
situation, at least would be no caution to handle the consistency between other 
clients. That said, options are always good, and have the possibility to 
disable the ZIL per filesystem is more one *gun* in the world. And as always, 
can reach the cops and the bad guys. 
 Keep in mind that JB is trying to send to jail who is winning performance 
benchs without syncing to disks. ;-)
 Keep the good work in your blog!

 Leal
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Disabling COMMIT at NFS level, or disabling ZIL on a per-filesystem basis

2008-10-22 Thread Bob Friesenhahn
On Wed, 22 Oct 2008, Neil Perrin wrote:
>
> On 10/22/08 10:26, Constantin Gonzalez wrote:
>> 3. Disable ZIL[1]. This is of course evil, but one customer pointed out to me
>> that if a tar xvf were writing locally to a ZFS file system, the writes
>> wouldn't be synchronous either, so there's no point in forcing NFS users
>> to having a better availability experience at the expense of performance.

The conclusion reached here is quite seriously wrong and no Sun 
employee should suggest it to a customer.  If the system writing to a 
local filesystem reboots then the applications which were running are 
also lost and will see the new filesystem state when they are 
restarted.  If an NFS server sponteneously reboots, the applications 
on the many clients are still running and the client systems are using 
cached data.  This means that clients could do very bad things if the 
filesystem state (as seen by NFS) is suddenly not consistent.  One of 
the joys of NFS is that the client continues unhindered once the 
server returns.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Disabling COMMIT at NFS level, or disabling ZIL on a per-filesystem basis

2008-10-22 Thread Neil Perrin


On 10/22/08 10:26, Constantin Gonzalez wrote:
> Hi,
> 
> On a busy NFS server, performance tends to be very modest for large amounts
> of small files due to the well known effects of ZFS and ZIL honoring the
> NFS COMMIT operation[1].
> 
> For the mature sysadmin who knows what (s)he does, there are three
> possibilities:
> 
> 1. Live with it. Hard, if you see 10x less performance than could be and your
> users complain a lot.
> 
> 2. Use a flash disk for a ZIL, a slog. Can add considerable extra cost,
> especially if you're using an X4500/X4540 and can't swap out fast SAS
> drives for cheap SATA drives to free the budget for flash ZIL drives.[2]
> 
> 3. Disable ZIL[1]. This is of course evil, but one customer pointed out to me
> that if a tar xvf were writing locally to a ZFS file system, the writes
> wouldn't be synchronous either, so there's no point in forcing NFS users
> to having a better availability experience at the expense of performance.
> 
> 
> So, if the sysadmin draws the informed and conscious conclusion that (s)he
> doesn't want to honor NFS COMMIT operations, what are options less disruptive
> than disabling ZIL completely?
> 
> - I checked the NFS tunables from:
>http://dlc.sun.com/osol/docs/content/SOLTUNEPARAMREF/chapter3-1.html
>But could not find a tunable that would disable COMMIT honoring.
>Is there already an RFE asking for a share option that disable's the
>translation of COMMIT to synchronous writes?

- None that I know of...
> 
> - The ZIL exists on a per filesystem basis in ZFS. Is there an RFE already
>that asks for the ability to disable the ZIL on a per filesystem basis?

Yes: 6280630 zil synchronicity

Though personally I've been unhappy with the exposure that zil_disable has got.
It was originally meant for debug purposes only. So providing an official
way to make synchronous behaviour asynchronous is to me dangerous.

> 
>Once Admins start to disable the ZIL for whole pools because the extra
>performance is too tempting, wouldn't it be the lesser evil to let them
>disable it on a per filesystem basis?
> 
> Comments?
> 
> 
> Cheers,
> Constantin
> 
> [1]: http://blogs.sun.com/roch/entry/nfs_and_zfs_a_fine
> [2]: http://blogs.sun.com/perrin/entry/slog_blog_or_blogging_on
> 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Disabling COMMIT at NFS level, or disabling ZIL on a per-filesystem basis

2008-10-22 Thread Constantin Gonzalez
Hi,

On a busy NFS server, performance tends to be very modest for large amounts
of small files due to the well known effects of ZFS and ZIL honoring the
NFS COMMIT operation[1].

For the mature sysadmin who knows what (s)he does, there are three
possibilities:

1. Live with it. Hard, if you see 10x less performance than could be and your
users complain a lot.

2. Use a flash disk for a ZIL, a slog. Can add considerable extra cost,
especially if you're using an X4500/X4540 and can't swap out fast SAS
drives for cheap SATA drives to free the budget for flash ZIL drives.[2]

3. Disable ZIL[1]. This is of course evil, but one customer pointed out to me
that if a tar xvf were writing locally to a ZFS file system, the writes
wouldn't be synchronous either, so there's no point in forcing NFS users
to having a better availability experience at the expense of performance.


So, if the sysadmin draws the informed and conscious conclusion that (s)he
doesn't want to honor NFS COMMIT operations, what are options less disruptive
than disabling ZIL completely?

- I checked the NFS tunables from:
   http://dlc.sun.com/osol/docs/content/SOLTUNEPARAMREF/chapter3-1.html
   But could not find a tunable that would disable COMMIT honoring.
   Is there already an RFE asking for a share option that disable's the
   translation of COMMIT to synchronous writes?

- The ZIL exists on a per filesystem basis in ZFS. Is there an RFE already
   that asks for the ability to disable the ZIL on a per filesystem basis?

   Once Admins start to disable the ZIL for whole pools because the extra
   performance is too tempting, wouldn't it be the lesser evil to let them
   disable it on a per filesystem basis?

Comments?


Cheers,
Constantin

[1]: http://blogs.sun.com/roch/entry/nfs_and_zfs_a_fine
[2]: http://blogs.sun.com/perrin/entry/slog_blog_or_blogging_on

-- 
Constantin Gonzalez  Sun Microsystems GmbH, Germany
Principal Field Technologisthttp://blogs.sun.com/constantin
Tel.: +49 89/4 60 08-25 91   http://google.com/search?q=constantin+gonzalez

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss