Re: [zfs-discuss] Why number of NFS threads jumps to the max value?

Roch - PAE Mon, 05 Mar 2007 09:00:32 -0800

Leon Koll writes:

 > On 3/5/07, Roch - PAE <[EMAIL PROTECTED]> wrote:
 > >
 > > Leon Koll writes:
 > >  > On 2/28/07, Roch - PAE <[EMAIL PROTECTED]> wrote:
 > >  > >
 > >  > >
 > >  > > http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6467988
 > >  > >
 > >  > > NFSD  threads are created  on a  demand  spike (all of  them
 > >  > > waiting  on I/O) but then    tend to stick around  servicing
 > >  > > moderate loads.
 > >  > >
 > >  > > -r
 > >  >
 > >  > Hello Roch,
 > >  > It's not my case. NFS stops to service after some point. And the
 > >  > reason is in ZFS. It never happens with NFS/UFS.
 > >  > Shortly, my scenario:
 > >  > 1st SFS run, 2000 requested IOPS. NFS is fine, ;low number of threads.
 > >  > 2st SFS run, 4000 requested IOPS. NFS cannot serve all requests, no of
 > >  > threads jumps to max
 > >  > 3rd SFS run, 2000 requested IOPS. NFS cannot serve all requests, no of
 > >  > threads jumps to max.
 > >  > System cannot get back to the same results under equal load (1st and 
 > > 3rd).
 > >  > Reboot between 2nd and 3rd doesn't help. The only persistent thing is
 > >  > a directory structure that was created during the 2nd run (in SFS
 > >  > higher requested load -> more directories/files created).
 > >  > I am sure it's a bug. I need help. I don't care that ZFS works N times
 > >  > worse than UFS. I really care that after heavy load everything is
 > >  > totally screwed.
 > >  >
 > >  > Thanks,
 > >  > -- Leon
 > >
 > > Hi Leon,
 > >
 > > How much is the slowdown between 1st and 3rd ? How filled is
 > 
 > Typical case is:
 > 1st: 1996 IOPS, latency  2.7
 > 3rd: 1375 IOPS, latency 37.9
 >


The large latency increase is the  side effect of requesting
more than what can be delivered. Queue builds up and latency
follow. So  it  should  not be  the  primary  focus IMO. The
Decrease in IOPS is the primary problem.

One hypothesis is that over the life of the FS we're moving
toward spreading access to the full disk platter. We can
imagine some fragmentation hitting as well. I'm not sure
how I'd test both hypothesis.

 > > the pool at each stage ? What does 'NFS stops to service'
 > > mean ?
 > 
 > There is a lot of error messages on the NFS(SFS) client :
 > sfs352: too many failed RPC calls - 416 good 27 bad
 > sfs3132: too many failed RPC calls - 302 good 27 bad
 > sfs3109: too many failed RPC calls - 533 good 31 bad
 > sfs353: too many failed RPC calls - 301 good 28 bad
 > sfs3144: too many failed RPC calls - 305 good 25 bad
 > sfs3121: too many failed RPC calls - 311 good 30 bad
 > sfs370: too many failed RPC calls - 315 good 27 bad
 >

Can this be timing out or queue full drops ? Might be a side 
effect of SFS requesting more than what can be delivered.

 > Thanks,
 > -- Leon

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Why number of NFS threads jumps to the max value?

Reply via email to