BTW: diverting this to squid-dev. I am convinced this is not a big DoS problem any more than some of the other performance issues are.

On 19/04/2013 5:19 p.m., Alex Rousskov wrote:
On 04/18/2013 01:06 AM, Amos Jeffries wrote:
On 18/04/2013 6:16 p.m., Amos Jeffries wrote:
On 13/04/2013 1:34 p.m., Alex Rousskov wrote:
Linear scanning of the index containing all memory-cached object chunks
is at the core of the problem. Not only such scans are slow, but they
probably rearrange Squid splay trees into the worst possible shape for
future searches, compounding the problem.

If I remove one Squid function doing such scans(*), I get x10 (80Mbps vs
8Mbps) performance boost for a 100MB response. I would expect an even
bigger difference for larger transfers because the linear scan overheads
are growing with the object size.

I would not be surprised if there is at least one more such function but
they are not easy to find and properly removing them will be difficult.

(*) MemObject::isContiguous().

Going a bit deeper the issue with MemObject::isContiguous() seems to
be the mem_hdr::getBlockContainingLocation() doing a nodes.find() from
the start of the nodes list on every call and it is called repeatedly
for each node in the object being scanned. The other places
getBlockContainingLocation() is called from will be having the same
performance drag issues.
  It looks worthwhile to make the mem_hdr functions which use
getBlockContainingLocation() store a last-accessed node pointer and
pass it as a seed value to skip the walking.
On deeper thought it looks easier to simply create a custom SPLAYWALKEE
function to walk the tree and perform the isContiguous() test without
calling find() at all.

Hi Amos,

     I am not an expert on splay trees, and I have not tested whether
Squid code matches the theory, but I believe splay trees are designed to
push the recently accessed node close to the top of the tree. This means
that find() itself is not the biggest problem as the last-accessed node
is essentially stored nearby for us by the tree.

However, splay trees do have a serious weakness -- a sequential scan of
a splay tree flattens the tree, making subsequent searches linear. This
makes splay trees a rather strange choice for storing memory chunks of
objects that are usually accessed ... sequentially.

Yes, I hit that one this morning while looking into this a bit more.

Any idea WTF are we using splay tree here?

I do agree that a custom walk function would be faster than searching
from scratch on every iteration, but walking a linear list of chunks
belonging to a 1GB object every time we store another byte is still
going to be rather slow, so I do not think walking the tree faster is
the best solution.

If I were to work on this problem, I would start with these two things:

a) Maintain isContiguous boolean member as chunks are being added or
removed. Maintaining this information is cheap and checking a boolean
member adds a negligible overhead, even if it is done often. This alone
will give us serious performance boost for large objects according to my
primitive tests.

b) Use a different structure to store and index chunks. While the
underlying structure is changed, make sure that each and every blocking
linear scan of the chunks index is absolutely necessary. I cannot say
for sure, but I suspect that _all_ of them can be eliminated.

Agreed. Although I don't have more time either to work on this anytime soon.

Amos

Reply via email to