On January 7, 2010, James Westby wrote:
> This of course is designing for failure, which is vital. However, it's
> the scale of the issue that has taken some getting used to. It may be a
> matter of magnitude, with a lot of API calls being made, so even a small
> failure rate translates in to a lot of issues, but it still seems like a
> lot. I have of course been filing bugs on LP about issues that I can
> identify, and spent some time today provoking bad responses and digging
> in to the reasons to file some more bugs. It seems that a lot of the
> problem now is the appservers refusing to communicate though, and I'm
> not sure there's a lot I can do on my end to debug that.

We are currently experiencing an issue on the servers which causes timeout for 
non-obvious reason. (In some loaded conditions, getting the lag in the cluster 
is taking way too much time. Normally, this operation is done in a blink. We 
are working on a fix.)

> 
> If you look at the list of the last 100 failures you will probably see
> some of this with clusters of the same signature, usually pointing to
> network communication in some manner.
> 

Looking at the signature today, I only see the first one as network related:
2 packages failed to many times to retry with key 
launchpadlib.errors.HTTPError:<module>:main:get_versions:iterate_collection:get_collection_slice:get:_request
 

Unless some of the root cause behind the other signatures is network related, 
but from the signature itself, it's not obvious.

-- 
Francis J. Lacoste
francis.laco...@canonical.com

Attachment: signature.asc
Description: This is a digitally signed message part.

-- 
ubuntu-distributed-devel mailing list
ubuntu-distributed-devel@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-distributed-devel

Reply via email to