[OpenAFS] OpenAFS 1.8.7 available

2021-01-14 Thread Benjamin Kaduk
The OpenAFS Guardians are happy to announce the availability of OpenAFS 1.8.7. Source files can be accessed via the web at: https://www.openafs.org/release/openafs-1.8.7.html or via AFS at: UNIX: /afs/grand.central.org/software/openafs/1.8.7/ UNC:

Re: [OpenAFS] 14 Jan 2021 08:25:36 GMT Breakage in RX Connection ID calculation

2021-01-14 Thread Andreas Hirczy
Benjamin Kaduk writes: > Just to confirm: what method did you use to "restart a client"? Reboot :) Just installing the client fixed my most pressuring issues, but reboot a client is not an option at the moment. >> | root@faepop78 ~ # ls /afs/itp.tugraz.at/ >> | /bin/ls: cannot open directory

Re: [OpenAFS] 14 Jan 2021 08:25:36 GMT Breakage in RX Connection ID calculation

2021-01-14 Thread Benjamin Kaduk
On Thu, Jan 14, 2021 at 08:22:49PM +0100, Andreas Hirczy wrote: > Jeffrey E Altman writes: > > >>> Patches to correct the flaw are available from OpenAFS Gerrit > >>> > >>> https://gerrit.openafs.org/14491 > >>> rx: rx_InitHost do not overwrite RAND_bytes rx_nextCid > >>> > >>>

Re: [OpenAFS] 14 Jan 2021 08:25:36 GMT Breakage in RX Connection ID calculation

2021-01-14 Thread Andreas Hirczy
Jeffrey E Altman writes: >>> Patches to correct the flaw are available from OpenAFS Gerrit >>> >>> https://gerrit.openafs.org/14491 >>> rx: rx_InitHost do not overwrite RAND_bytes rx_nextCid >>> >>> https://gerrit.openafs.org/14492 >>> rx: update_nextCid overflow handling is broken >> >>

Re: [OpenAFS] 14 Jan 2021 08:25:36 GMT Breakage in RX Connection ID calculation

2021-01-14 Thread Heinz-Ado Arnolds
The clients with the patched 1.8.6 have been rebooted. All database and fileservers servers have not been rebooted. Jonathan Billings wrote on 14.01.21 19:14: On Thu, Jan 14, 2021 at 12:46 PM Heinz-Ado Arnolds mailto:arno...@mpa-garching.mpg.de>> wrote: I'm still having problems when

Re: [OpenAFS] AFS database problem

2021-01-14 Thread Jonathan D. Proulx
On Thu, Jan 14, 2021 at 03:18:36PM +0100, Andreas Hirczy wrote: :FB writes: : :> is a major problem with AFS database (at least two that do not have any connection). :> UBIK doesn't seem to be able to elect a sync site anymore. OpenAFS is 1.8.2-1 :> and 1.8.5-1. : :Same here - all 3 DB servers

Re: [OpenAFS] 14 Jan 2021 08:25:36 GMT Breakage in RX Connection ID calculation

2021-01-14 Thread Jeffrey E Altman
On 1/14/2021 1:20 PM, Jeffrey E Altman (jalt...@auristor.com) wrote: > On 1/14/2021 10:55 AM, Jeffrey E Altman (jalt...@auristor.com) wrote: >> This morning at 14 Jan 2021 08:25:36 GMT all restarted or newly started >> OpenAFS 1.8 clients and servers began to experience RX communication >>

Re: [OpenAFS] 14 Jan 2021 08:25:36 GMT Breakage in RX Connection ID calculation

2021-01-14 Thread Jeffrey E Altman
On 1/14/2021 10:55 AM, Jeffrey E Altman (jalt...@auristor.com) wrote: > This morning at 14 Jan 2021 08:25:36 GMT all restarted or newly started > OpenAFS 1.8 clients and servers began to experience RX communication > failures. The RX Connection ID of all calls initiated by the peer are > the

Re: [OpenAFS] 14 Jan 2021 08:25:36 GMT Breakage in RX Connection ID calculation

2021-01-14 Thread Jonathan Billings
On Thu, Jan 14, 2021 at 12:46 PM Heinz-Ado Arnolds < arno...@mpa-garching.mpg.de> wrote: > I'm still having problems when doing an ssh from a patched 1.8.6 client to > a server running an unpatched 1.8.6 and vice versa. The login process hangs > during aklog. That means both machines have to run

Re: [OpenAFS] 14 Jan 2021 08:25:36 GMT Breakage in RX Connection ID calculation

2021-01-14 Thread Heinz-Ado Arnolds
P.S.: I get timeouts with ssh logins even between patched 1.8.6 clients during aklog (tested by commenting aklog from pam settings -> no timeout). Token was obtained after timeout. Cheers and thanks again, Ado Heinz-Ado Arnolds wrote on 14.01.21 18:45: Dear Jeffrey, many thanks for your

[OpenAFS] unsubscribe

2021-01-14 Thread Biswas, Brian P
Please unsubscribe me from theses emails. Thanks, --Brian Biswas

Re: [OpenAFS] 14 Jan 2021 08:25:36 GMT Breakage in RX Connection ID calculation

2021-01-14 Thread Heinz-Ado Arnolds
Dear Jeffrey, many thanks for your fast response from Germany too! When issuing "vos listvol " on a patched 1.8.6 client to an 1.6.22.1 , I still get "Could not get the list of partitions from the server. Possible communication failure". The same command works from a client running 1.6.23.

Re: [OpenAFS] 14 Jan 2021 08:25:36 GMT Breakage in RX Connection ID calculation

2021-01-14 Thread Neil Brown
On Thu, 14 Jan 2021, Jeffrey E Altman wrote: Patches to correct the flaw are available from OpenAFS Gerrit https://gerrit.openafs.org/14491 rx: rx_InitHost do not overwrite RAND_bytes rx_nextCid https://gerrit.openafs.org/14492 rx: update_nextCid overflow handling is broken Jeffrey,

[OpenAFS] 14 Jan 2021 08:25:36 GMT Breakage in RX Connection ID calculation

2021-01-14 Thread Jeffrey E Altman
This morning at 14 Jan 2021 08:25:36 GMT all restarted or newly started OpenAFS 1.8 clients and servers began to experience RX communication failures. The RX Connection ID of all calls initiated by the peer are the same: 0x8002 Patches to correct the flaw are available from OpenAFS Gerrit

Re: EXTERNAL: [OpenAFS] Preliminary findings on today's brokenness

2021-01-14 Thread Chaskiel Grundman
I guess I should elaborate a little The "RX Epoch" is a value chosen by each copy of the RX network stack and is used, in part, to disambiguate different instances of RX running on the same port. In openafs, the RX stack exists inside the RX-using process, not the networking bits in the kernel, so

Re: [OpenAFS] Preliminary findings on today's brokenness

2021-01-14 Thread Benjamin Kaduk
Jeffrey has dome some analysis that is consistent with your results, and posted patches at https://gerrit.openafs.org/#/c/14491 https://gerrit.openafs.org/#/c/14492 We'll be reviewing those shortly. -Ben On Thu, Jan 14, 2021 at 10:21:22AM -0500, Chaskiel Grundman wrote: > None of these things

Re: EXTERNAL: [OpenAFS] Preliminary findings on today's brokenness

2021-01-14 Thread Ben Carter
So we are running 1.6 code and we definitely have a problem. However for us, a sync site is being elected, but doing a vos examine from a client seems to hang. Actual access to files in AFS seems to be working fine but we've not restarted any file server processes. Ben On 1/14/21 10:21

[OpenAFS] Preliminary findings on today's brokenness

2021-01-14 Thread Chaskiel Grundman
None of these things is confirmed yet, but based on some analysis and testing carnegie mellon has done today: - The problem is in RX (the transport layer), not any of the applications - It likely affects 1.8.0 and newer, but not 1.6 - It seems to be triggered by the RX epoch being after the unix

Re: [OpenAFS] AFS database problem

2021-01-14 Thread FB
Hi, - Ursprüngliche Mail - > Von: "Andreas Hirczy" > An: "openafs-info" > Gesendet: Donnerstag, 14. Januar 2021 15:18:36 > Betreff: Re: [OpenAFS] AFS database problem > FB writes: > >> is a major problem with AFS database (at least two that do not have any >> connection). >> UBIK

Re: [OpenAFS] AFS database problem

2021-01-14 Thread Andreas Hirczy
FB writes: > is a major problem with AFS database (at least two that do not have any > connection). > UBIK doesn't seem to be able to elect a sync site anymore. OpenAFS is 1.8.2-1 > and 1.8.5-1. Same here - all 3 DB servers claim "I am not sync site" while reporting identical sync-site and

Re: [OpenAFS] OpenAFS stopped working - me too

2021-01-14 Thread Marcio Barbosa
> could this have to do with some kind of bug in timestamps > at GMT: Thursday, January 14, 2021 8:25:36 AM > the Unix hex timestamp was 6000 Interesting. Setting date to "14 JAN 2021 08:21:36” seems to solve the problem on my test cell. On Jan 14, 2021, at 10:32 AM, Andreas Weiss

Re: [OpenAFS] Help: OpenAFS suddenly completely stopped working

2021-01-14 Thread Valtteri Vuorikoski
Replying to myself: no solution, but for others facing problems here are the incantations to extract data from a defunct fileserver: # Check volume id from RWrite column and partition letter (eg a) # from server row for the volume you need to restore. # Example: volid=42 partition=a (from

[OpenAFS] AFS database problem

2021-01-14 Thread FB
Dear AFSians, is a major problem with AFS database (at least two that do not have any connection). UBIK doesn't seem to be able to elect a sync site anymore. OpenAFS is 1.8.2-1 and 1.8.5-1. I don't think, it's a coincidence that Bit#29 of EPOCH went from 0 to one at the almost same time.

Re: [OpenAFS] Help: OpenAFS suddenly completely stopped working

2021-01-14 Thread Kendrick Hernandez
We're seeing a similar issue. We just recently migrated all of our dafileservers to 1.8.6 (the three dbs are still on 1.6.24). We're running CentOS 7.9 (kernel 3.10.0-1160.2.2) and these are all vms on vmware. The db servers appear to be okay (vos listvldb works, udebug shows recovery state 1f),

Re: [OpenAFS] OpenAFS stopped working - me too

2021-01-14 Thread Andreas Weiss
Dear all, could this have to do with some kind of bug in timestamps at GMT: Thursday, January 14, 2021 8:25:36 AM the Unix hex timestamp was 6000 maybe just a wild-goose chase clients with openafs 1.8.6 are problematic, 1.6 seems to be fine. best andi On 1/14/21 2:27 PM, Florian

Re: [OpenAFS] OpenAFS stopped working - me too

2021-01-14 Thread Jan Iven
On 14/01/2021 14:12, Heinz-Ado Arnolds wrote: Dear colleagues, thanks for your notes! We have the same problem with 1.8.6 clients since this morning and are working hard to find the problem. Our complete cell is not able to work any more. Any help would be greatly appreciated. Another "me

Re: [OpenAFS] OpenAFS stopped working - me too

2021-01-14 Thread Florian Möller
Dear all, same problem here. Our complete cell does not work any more. Any ideas? Best, Florian -- Dr. Florian Möller Universität Würzburg Institut für Mathematik Emil-Fischer-Straße 30 97074 Würzburg, Germany Tel. +49 931 3185596 Am 14.01.21 um 14:02 schrieb Neil Brown: Sorry, struggling

Re: [OpenAFS] OpenAFS stopped working - me too

2021-01-14 Thread Heinz-Ado Arnolds
Dear colleagues, thanks for your notes! We have the same problem with 1.8.6 clients since this morning and are working hard to find the problem. Our complete cell is not able to work any more. Any help would be greatly appreciated. Cheers, Ado --

[OpenAFS] OpenAFS stopped working - me too

2021-01-14 Thread Neil Brown
Sorry, struggling with a connection today, but yes. Our site was working fine up until about 8:30am GMT this morning, and then we started having similar issues. We've been thinking it was a local configuration change, though we hadn't made any AFS changes, only an openssl update. But yes,

[OpenAFS] Help: OpenAFS suddenly completely stopped working

2021-01-14 Thread Valtteri Vuorikoski
I have a small OpenAFS 1.8.6 setup using the Debian and Ubuntu packages. Last night everything was working fine, this morning machines were timing out trying to talk to volume servers. Database replication was also stuck. While there is a single backup database and file server, databases and