On Sat, Nov 29, 2008 at 6:31 PM, Ray Clark <[EMAIL PROTECTED]>wrote:

> Tim,
>
> I don't think we would really disagree if we were in the same room.  I
> think in the process of the threaded communication that a few things got
> overlooked, or the wrong thing attributed.
>
> You are right that there are many differences.  Some of them are:
>
> - Tests done a year ago, I expect the kernel has had many changes.
> - He was moving data via ssh from zfs sed into zfs receive as opposed to my
> file operations over NFS.
> - My problem seems to occur on incompressible data.  His was all very
> compressible.
> - He had 5x the CPU x2 and 5x the memory.
>
> Yes, I jumped on what I saw as common symptoms, in hakimian's words:
> "becoming increasing unresponsive until it was indistinguishable from a
> complete lockup".  This is similar to my description of "After about 12
> hours, the throughput has slowed to a crawl.  The Solaris machine takes a
> minute or more to respond to every character typed..." and "disk throughput
> is in the range of 100K bytes/second".
>
> I was the one who judged these symptoms to be essentially identical, I did
> not say that Hakimian made that statement.  I also pointed out that he was
> seeing these "identical" symptoms in a very different environment, which
> would be your point.
>
> Regarding my 768 vs. 1024, there were no changes other than the change in
> memory.  So whatever else is true, the system had 33% more memory to work
> with minimum.  Given that probably a few hundred Meg is needed for a just
> booted, idle system, the effective percentage increase in memory for zfs to
> work with is in reality higher.  I may not have given in 4GB, but I gave it
> substantially more than it had.  It should behave substantially differently
> if memory is the limiting factor.  Just because memory is thin does not make
> it the limiting factor.  I believe the indications by top and vmstat that
> there is free memory (available to be reallocated) that nothing is gobbling
> up also suggests that memory is not the limiting factor.
>
> Regarding my design decisions, I did not make bad design decisions.  I have
> what I have.  I know it is substandard.
>
> Also you seem to be reacting as though I was complaining about the 3MB/Sec
> throughput.  I believe I stated that I understand that there are many
> sub-optimal aspects of this system.  However I don't believe any of them
> explain it running fine for a few hours, then slowing down by a factor of
> 30, for a few hours, then going back up.  I am trying to understand and
> resolve the dysfunctional behavior, not the poor but plausible throughput.
>  In any system there are many possible bottlenecks, most of which are
> probably suboptimal, but it is not productive to focus on the 15MB/Sec links
> in the chain when you have a 100KB/Sec problem.  Increasing the 15MB/Sec to
> 66 or 132MB/Sec is just not going to have a large effect!
>
> I think/hope I have reconciled our apparent differences.  If not, so be it.
>  I do appreciate your suggestions and insights, and they are not lost on me.
>
> --Ray
> --
>

My point is you're not looking at the bigger picture.  "Well this small
portion is working some of the time so it's ok, and this small portion is
working some of the time so it's ok, but when I throw it all together
something isn't quite right so it must be the software."

Case in point on the memory front:
http://www.opensolaris.org/jive/thread.jspa?messageID=309878

--Tim
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to