Tim, I don't think we would really disagree if we were in the same room. I think in the process of the threaded communication that a few things got overlooked, or the wrong thing attributed.
You are right that there are many differences. Some of them are: - Tests done a year ago, I expect the kernel has had many changes. - He was moving data via ssh from zfs sed into zfs receive as opposed to my file operations over NFS. - My problem seems to occur on incompressible data. His was all very compressible. - He had 5x the CPU x2 and 5x the memory. Yes, I jumped on what I saw as common symptoms, in hakimian's words: "becoming increasing unresponsive until it was indistinguishable from a complete lockup". This is similar to my description of "After about 12 hours, the throughput has slowed to a crawl. The Solaris machine takes a minute or more to respond to every character typed..." and "disk throughput is in the range of 100K bytes/second". I was the one who judged these symptoms to be essentially identical, I did not say that Hakimian made that statement. I also pointed out that he was seeing these "identical" symptoms in a very different environment, which would be your point. Regarding my 768 vs. 1024, there were no changes other than the change in memory. So whatever else is true, the system had 33% more memory to work with minimum. Given that probably a few hundred Meg is needed for a just booted, idle system, the effective percentage increase in memory for zfs to work with is in reality higher. I may not have given in 4GB, but I gave it substantially more than it had. It should behave substantially differently if memory is the limiting factor. Just because memory is thin does not make it the limiting factor. I believe the indications by top and vmstat that there is free memory (available to be reallocated) that nothing is gobbling up also suggests that memory is not the limiting factor. Regarding my design decisions, I did not make bad design decisions. I have what I have. I know it is substandard. Also you seem to be reacting as though I was complaining about the 3MB/Sec throughput. I believe I stated that I understand that there are many sub-optimal aspects of this system. However I don't believe any of them explain it running fine for a few hours, then slowing down by a factor of 30, for a few hours, then going back up. I am trying to understand and resolve the dysfunctional behavior, not the poor but plausible throughput. In any system there are many possible bottlenecks, most of which are probably suboptimal, but it is not productive to focus on the 15MB/Sec links in the chain when you have a 100KB/Sec problem. Increasing the 15MB/Sec to 66 or 132MB/Sec is just not going to have a large effect! I think/hope I have reconciled our apparent differences. If not, so be it. I do appreciate your suggestions and insights, and they are not lost on me. --Ray -- This message posted from opensolaris.org _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss