Re: [zfs-discuss] Re: Re: Heavy writes freezing system

2007-01-18 Thread Richard Elling

Rainer Heilke wrote:

If you plan on RAC, then ASM makes good sense.  It is
unclear (to me anyway)
if ASM over a zvol is better than ASM over a raw LUN.


Hmm. I thought ASM was really the _only_ effective way to do RAC, 
but then, I'm not a DBA (and don't want to be ;-)  We'll be just 
using raw LUN's. While the zvol idea is interesting, the DBA's 
are very particular about making sure the environment is set up 
in a way Oracle will support (and not hang up when we have a problem).


ASM is relatively new technology. Traditionally, OPS and RAC were
built over raw devices, directly or as represented by cluster-aware
logical volume managers.  DBAs tend to not like raw, so Sun Cluster
(Solaris Cluster) supports RAC over QFS which is a very good solution.
Some Sun Cluster customers run RAC over NFS, which also works
surprisingly well.

Meanwhile, Oracle continues to develop ASM to appease the DBAs who
want filesystem-like solutions.  IMHO, in the long run, Oracle will
transition many customers to ASM and this means that it probably
isn't worth the effort to make a file system be the best for Oracle,
at the expense of other features and workloads.
 -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Re: Heavy writes freezing system

2007-01-18 Thread Rainer Heilke
> If you plan on RAC, then ASM makes good sense.  It is
> unclear (to me anyway)
> if ASM over a zvol is better than ASM over a raw LUN.

Hmm. I thought ASM was really the _only_ effective way to do RAC, but then, I'm 
not a DBA (and don't want to be ;-)  We'll be just using raw LUN's. While the 
zvol idea is interesting, the DBA's are very particular about making sure the 
environment is set up in a way Oracle will support (and not hang up when we 
have a problem).

Rainer
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Re: Heavy writes freezing system

2007-01-18 Thread Roch - PAE

If some aspect  of the load is writing  large amount of data
into the pool (through  the memory cache,  as opposed to the
zil)  and that leads  to a frozen system,  I  think that a
possible contributor should be:

|6429205||each zpool needs to monitor its throughput and throttle heavy 
writers|

-r

Anantha N. Srirama writes:
 > Bug 6413510 is the root cause. ZFS maestros please correct me if I'm quoting 
 > an incorrect bug.
 >  
 >  
 > This message posted from opensolaris.org
 > ___
 > zfs-discuss mailing list
 > zfs-discuss@opensolaris.org
 > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Re: Heavy writes freezing system

2007-01-17 Thread Neil Perrin



Rainer Heilke wrote On 01/17/07 15:44,:

It turns out we're probably going to go the UFS/ZFS route, with 4 filesystems 
(the DB files on

> UFS with Directio).


It seems that the pain of moving from a single-node ASM to a RAC'd ASM is 
great, and not worth it.

> The DBA group decided doing the migration to UFS for the DB files now, and
> then to a RAC'd ASM later, will end up being the easiest, safest route.


Rainer
Still curious as to if and when this bug will get fixed...


If you're referring to bug 6413510 that Anantha mentioned then my
earlier post today answered that:

> This problem was fixed in snv_48 last September and will be
> in S10_U4.

Neil
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Re: Heavy writes freezing system

2007-01-17 Thread Anantha N. Srirama
I did some straight up Oracle/ZFS testing but not on Zvols. I'll give it a shot 
and report back, next week is the earliest.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Re: Heavy writes freezing system

2007-01-17 Thread Rainer Heilke
It turns out we're probably going to go the UFS/ZFS route, with 4 filesystems 
(the DB files on UFS with Directio).

It seems that the pain of moving from a single-node ASM to a RAC'd ASM is 
great, and not worth it. The DBA group decided doing the migration to UFS for 
the DB files now, and then to a RAC'd ASM later, will end up being the easiest, 
safest route.

Rainer
Still curious as to if and when this bug will get fixed...
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Re: Heavy writes freezing system

2007-01-17 Thread Casper . Dik

>We had a 2TB filesystem. No matter what options I set explicitly, the
>UFS filesystem kept getting written with a 1 million file limit.
>Believe me, I tried a lot of options, and they kept getting se t back
>on me.

The limit is documented as "1 million inodes per TB".  So something
must not have gone right.  But many people have complained and
you could take the newfs source and fix the limitation.

The discontinuity when going from <1TB to over 1TB is appaling.
(<1TB allows for 137million inodes; >= 1TB allows for 1million per).

The rationale is fsck time (but logging is forced anyway)

The 1 million limit is arbitrary and too low...

Casper
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Re: Heavy writes freezing system

2007-01-17 Thread Rainer Heilke
We had a 2TB filesystem. No matter what options I set explicitly, the UFS 
filesystem kept getting written with a 1 million file limit. Believe me, I 
tried a lot of options, and they kept getting set back on me.

After a fair bit of poking around (Google, Sun's site, etc.) I found several 
other notes indicating that this was the limit for UFS file systems. (For the 
pedants, keep in mind we are talking computers, so the actual number will be 
some exponent of 2. "! million" is an approximation.)

If someone has gotten around this under UFS, I'd be very interested--as an 
intellectual curiousity--in knowing what switches you passed to the mkfs/newfs 
command(s).

Rainer
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Re: Heavy writes freezing system

2007-01-17 Thread Neil Perrin



Anantha N. Srirama wrote On 01/17/07 08:32,:

Bug 6413510 is the root cause. ZFS maestros please correct me if I'm quoting an 
incorrect bug.


Yes, Anantha is correct that is the bug id, which could be responsible
for more disk writes than expected.

Let me try to explain that bug.
The ZIL as described in http://blogs.sun.com/perrin
collects transactions in memory of all system calls until they are
committed in a transaction group (txg) at the pool level. If a request
arrives to force to stable stoarge a particular file (fsync or O_DSYNC)
then the ZIL used to write out all in memory transactions for the
file system. This meant transactions unrelated to that file
were written including directory creations, renames etc - which might
be important in being able to re-create the file. However,
it also pushed out user data for other files, which can be voluminous.
The problem was originally seen when a ksh history file was fsync-ed
during a large data write. It would take many seconds to flush
the large write through the log, just to ensure a "pwd" command typed
was safely on disk! This inefficiency occurs only when a "mismatch" of
applications use the same file system.

The fix was essentially to push out all meta data for the file system but
only the file data related to the file being fsync-ed or O_DSYC-ed.
This problem was fixed in snv_48 last September and will be
in S10_U4.

Neil.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Re: Heavy writes freezing system

2007-01-17 Thread Rainer Heilke
> Also as an workaround you could disable zil if it's
> acceptable to you
> (in case of system panic or hard reset you can endup
> with
> unrecoverable database).

Again, not an option, but thatnks for the pointer. I read a bit about this last 
week, and it sounds way too scary.

Rainer
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Re: Heavy writes freezing system

2007-01-17 Thread Anantha N. Srirama
Bug 6413510 is the root cause. ZFS maestros please correct me if I'm quoting an 
incorrect bug.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Re: Heavy writes freezing system

2007-01-16 Thread Rainer Heilke
The DBA team isn't wanting to do another test. They have "made up their minds". 
We have a meeting with them tomorrow, though, and will try to convince them of 
one more test so that we can try the mdb and fsstat tools. (The admin doing the 
tests was using iostat, not fsstat.) I, at least, am interested in finding 
exactly where the failure is, rather than just saying "ZFS doesn't work". :-(

Rainer
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Re: Heavy writes freezing system

2007-01-16 Thread Rainer Heilke
> Rainer Heilke,
> 
> You have 1/4 of the amount of memory that the 2900
> 0 system is capable of (192GBs : I think).

Yep. The server does not hold the application (three-tier architecture) so this 
is the standard build we bought. The memory has not indicated any problems. All 
errors point to write issues.

>   Secondly, output from fsstat(1M) could be helpful.
> 
>   Run this command over time and check to see if the
>   values change over time..

Thanks. I'll pass this along to the person doing the testing. He's been doing 
some measuring, but I'm not sure if fsstat was one of them.

Rainer
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss