On January 7, 2010, James Westby wrote: > This of course is designing for failure, which is vital. However, it's > the scale of the issue that has taken some getting used to. It may be a > matter of magnitude, with a lot of API calls being made, so even a small > failure rate translates in to a lot of issues, but it still seems like a > lot. I have of course been filing bugs on LP about issues that I can > identify, and spent some time today provoking bad responses and digging > in to the reasons to file some more bugs. It seems that a lot of the > problem now is the appservers refusing to communicate though, and I'm > not sure there's a lot I can do on my end to debug that.
We are currently experiencing an issue on the servers which causes timeout for non-obvious reason. (In some loaded conditions, getting the lag in the cluster is taking way too much time. Normally, this operation is done in a blink. We are working on a fix.) > > If you look at the list of the last 100 failures you will probably see > some of this with clusters of the same signature, usually pointing to > network communication in some manner. > Looking at the signature today, I only see the first one as network related: 2 packages failed to many times to retry with key launchpadlib.errors.HTTPError:<module>:main:get_versions:iterate_collection:get_collection_slice:get:_request Unless some of the root cause behind the other signatures is network related, but from the signature itself, it's not obvious. -- Francis J. Lacoste francis.laco...@canonical.com
signature.asc
Description: This is a digitally signed message part.
-- ubuntu-distributed-devel mailing list ubuntu-distributed-devel@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-distributed-devel