On 3/5/07, Spencer Shepler <[EMAIL PROTECTED]> wrote:
On Mar 5, 2007, at 11:17 AM, Leon Koll wrote: > On 3/5/07, Roch - PAE <[EMAIL PROTECTED]> wrote: >> >> Leon Koll writes: >> >> > On 3/5/07, Roch - PAE <[EMAIL PROTECTED]> wrote: >> > > >> > > Leon Koll writes: >> > > > On 2/28/07, Roch - PAE <[EMAIL PROTECTED]> wrote: >> > > > > >> > > > > >> > > > > http://bugs.opensolaris.org/bugdatabase/view_bug.do? >> bug_id=6467988 >> > > > > >> > > > > NFSD threads are created on a demand spike (all of >> them >> > > > > waiting on I/O) but then tend to stick around >> servicing >> > > > > moderate loads. >> > > > > >> > > > > -r >> > > > >> > > > Hello Roch, >> > > > It's not my case. NFS stops to service after some point. >> And the >> > > > reason is in ZFS. It never happens with NFS/UFS. >> > > > Shortly, my scenario: >> > > > 1st SFS run, 2000 requested IOPS. NFS is fine, ;low number >> of threads. >> > > > 2st SFS run, 4000 requested IOPS. NFS cannot serve all >> requests, no of >> > > > threads jumps to max >> > > > 3rd SFS run, 2000 requested IOPS. NFS cannot serve all >> requests, no of >> > > > threads jumps to max. >> > > > System cannot get back to the same results under equal >> load (1st and 3rd). >> > > > Reboot between 2nd and 3rd doesn't help. The only >> persistent thing is >> > > > a directory structure that was created during the 2nd run >> (in SFS >> > > > higher requested load -> more directories/files created). >> > > > I am sure it's a bug. I need help. I don't care that ZFS >> works N times >> > > > worse than UFS. I really care that after heavy load >> everything is >> > > > totally screwed. >> > > > >> > > > Thanks, >> > > > -- Leon >> > > >> > > Hi Leon, >> > > >> > > How much is the slowdown between 1st and 3rd ? How filled is >> > >> > Typical case is: >> > 1st: 1996 IOPS, latency 2.7 >> > 3rd: 1375 IOPS, latency 37.9 >> > >> >> The large latency increase is the side effect of requesting >> more than what can be delivered. Queue builds up and latency >> follow. So it should not be the primary focus IMO. The >> Decrease in IOPS is the primary problem. >> >> One hypothesis is that over the life of the FS we're moving >> toward spreading access to the full disk platter. We can >> imagine some fragmentation hitting as well. I'm not sure >> how I'd test both hypothesis. >> >> > > the pool at each stage ? What does 'NFS stops to service' >> > > mean ? >> > >> > There is a lot of error messages on the NFS(SFS) client : >> > sfs352: too many failed RPC calls - 416 good 27 bad >> > sfs3132: too many failed RPC calls - 302 good 27 bad >> > sfs3109: too many failed RPC calls - 533 good 31 bad >> > sfs353: too many failed RPC calls - 301 good 28 bad >> > sfs3144: too many failed RPC calls - 305 good 25 bad >> > sfs3121: too many failed RPC calls - 311 good 30 bad >> > sfs370: too many failed RPC calls - 315 good 27 bad >> > >> >> Can this be timing out or queue full drops ? Might be a side >> effect of SFS requesting more than what can be delivered. > > I don't know was it timeouts or full drops. SFS marked such runs as > INVALID. > I can run whatever is needed to help to investigate the problem. If > you have a D script that will tell us more, please send it to me. > I appreciate your help. The failed RPCs are indeed a result of the SFS client timing out the requests it has made to the server. The server is being overloaded for its capabilities and the benchmark results show that. I agree with Roch that as the SFS benchmark adds more data to the filesystems that additional latency is added and for this particular configuration and the server is being over-driven. The helpful thing would be to run smaller increments in the benchmark to determine where the response time increases beyond what the SFS workload can handle. There have been a number of changes in ZFS recently that should help with SFS performance measurement but fundamentally it all depends on the configuration of the server (number of spindles and CPU available). So there may be a limit that is being reached based on the hardware configuration. What is your real goal here, Leon? Are you trying to gather SFS data to fit into sizing of a particular solution or just trying to gather performance results for other general comparisons?
Spencer, I am using SFS benchmark to emulate the real-world environment for the NAS solution that I've built. SFS is able to create as many processes (each one emulating separate client) as one needs plus it's rich of meta operations. My real goal is: quiet peaceful nights after my solution will start to work in the production :) What I see now: after some step the directory structure/on-disk layout of ZFS? becomes an obstacle that cannot be passed. I don't want that this will happen in the field, that's it. And I am sure we have a bug case here.
There are certainly better benchmarks than SFS for either sizing and comparison reasons.
I haven't found anything better: client-independent, multi-proc./multi-threaded, meta-rich, comparable. An obvious drawback is: $$. I think that a free replacement of it is almost unknown and underestimated fstress ( http://www.cs.duke.edu/ari/fstress/ ). Which ones are your candidates? Thanks, -- Leon _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss