Re: EXTERNAL: [OpenAFS] Preliminary findings on today's brokenness

2021-01-14 Thread Chaskiel Grundman
I guess I should elaborate a little The "RX Epoch" is a value chosen by each copy of the RX network stack and is used, in part, to disambiguate different instances of RX running on the same port. In openafs, the RX stack exists inside the RX-using process, not the networking bits in the kernel, so

Re: [OpenAFS] Preliminary findings on today's brokenness

2021-01-14 Thread Benjamin Kaduk
Jeffrey has dome some analysis that is consistent with your results, and posted patches at https://gerrit.openafs.org/#/c/14491 https://gerrit.openafs.org/#/c/14492 We'll be reviewing those shortly. -Ben On Thu, Jan 14, 2021 at 10:21:22AM -0500, Chaskiel Grundman wrote: > None of these things is

Re: EXTERNAL: [OpenAFS] Preliminary findings on today's brokenness

2021-01-14 Thread Ben Carter
So we are running 1.6 code and we definitely have a problem. However for us, a sync site is being elected, but doing a vos examine from a client seems to hang. Actual access to files in AFS seems to be working fine but we've not restarted any file server processes. Ben On 1/14/21 10:21 A

[OpenAFS] Preliminary findings on today's brokenness

2021-01-14 Thread Chaskiel Grundman
None of these things is confirmed yet, but based on some analysis and testing carnegie mellon has done today: - The problem is in RX (the transport layer), not any of the applications - It likely affects 1.8.0 and newer, but not 1.6 - It seems to be triggered by the RX epoch being after the unix t