Yes, such a pain, but something interesting - I have two servers, the first was set up by previous employee and it was renamed manually and I had to replace cert on all clients & renamed the server through sp utility the second server was installed from the scratch and have never been renamed, and it works smoothly without any problems with osa-dispatcher.
Anyway, thanks man On Thu, Sep 1, 2016 at 12:57 PM, Matt Moldvan <m...@moldvan.com> wrote: > I have the same issues with 2.5 and latest OSAD packages... the connection > still looks like it's established at the client side, but for some reason > it has stopped trying to send data. The master no longer sees the > connection as open and therefore cannot send anything to it. > > The only resolution I've found is to restart the client(s), but for so > many systems this caused the dispatchers to become unresponsive during our > maintenance windows. Essentially, Puppet would run, restart OSAD, and it > would consume all the HTTP connections and make the GUI unresponsive. > Update and reboot actions were picked up outside of the scheduled > maintenance, and it was all around chaos. > > So at this point I'm stuck babysitting OSAD status of systems because > there is nothing easily found in /var/log/osad that indicates an issue, > even though the client still has 5222 open to the dispatcher and the osad > service is running. In the Spacewalk database, the system is marked > down... I ran an strace on the OSAD process on the client for about 30 > minutes, and didn't see any attempts to do anything. > > [me@osad-client1 ~]$ sudo lsof -Pp 21996 | grep TCP > osad 21996 root 3u IPv4 7392569 0t0 TCP > osad-client1:56939->spacewalk-master:5222 (ESTABLISHED) > [me@osad-client1 ~]$ service osad status > osad (pid 21996) is running... > [me@osad-client1 ~]$ sudo lsof -Pp 21996 | grep TCP > osad 21996 root 3u IPv4 7392569 0t0 TCP > osad-client1:56939->spacewalk-master:5222 (ESTABLISHED) > [me@osad-client1 ~]$ sudo strace -fp 21996 > Process 21996 attached > select(4, [3], [], [], NULL > > --- > rhnschema=# select s.name,pc.state_id from rhnpushclient pc, rhnserver s > where s.name='osad-client1' and pc.server_id=s.id; > name | state_id > -----------------------+---------- > osad-client1 | 2 > (1 row) > > Even though osad-client1 thought it was still connected, the master didn't > have a corresponding connection on 5222: > [me@spacewalk-master ~]$ netstat -a | grep osad-client1 > [me@spacewalk-master ~]$ > > For me, changing the values in /etc/jabberd/*.xml as recommended in > https://fedorahosted.org/spacewalk/wiki/JabberAndOSAD wasn't going to > work... I tried that and all systems would be disconnected, then would > reconnect, causing some (perhaps insignificant) load on the database as > well as unnecessary network traffic and client processing. I could see the > number systems marked as "online" in the database flapping wildly between > 1,000 and 5,000 over time. > > One thing I did notice on the systems that were marked offline... a > netstat showed two connections, one in CLOSE_WAIT status and another in > ESTABLISHED. On restart of OSAD, only one was there, in ESTABLISHED state > and the system was marked online again. > > I'm thinking that the OSAD Python code isn't closing the sockets properly > when an error is encountered, and leaves the client thinking it's still > connected, while the master doesn't have a corresponding connection to send > data to. > > Basically, as a workaround, I think I'm going to have systems restart OSAD > if they see connections open on 5222 in CLOSE_WAIT status... until > something better comes along and the client code is fixed up. > Unfortunately the workaround isn't even a full one... not every system had > multiple connections, but it's a step toward more systems staying usable > than before. > > On Thu, Sep 1, 2016 at 1:26 PM Konstantin Raskoshnyi <konra...@gmail.com> > wrote: > >> 2.4, I tried, actually after I did spacewalk-service restart it helped >> for one day. >> >> Now it's the same, but no any errors on both sides. >> >> On Wed, Aug 31, 2016 at 9:06 AM, Matthew Madey <mattma...@gmail.com> >> wrote: >> >>> What version of Spacewalk are you running? You likely need to reset the >>> osad credentials on the clients. This typically only occurs when the jabber >>> database has been corrupted. >>> >>> On the clients, run the below commands: >>> >>> >>> rm -f /etc/sysconfig/rhn/osad-auth.conf ; service osad restart >>> >>> You may find the below links helpful >>> >>> https://fedorahosted.org/spacewalk/wiki/OsadHowTo >>> >>> https://fedorahosted.org/spacewalk/wiki/JabberAndOSAD >>> >>> >>> >>> >>> On Aug 30, 2016 4:43 PM, "Konstantin Raskoshnyi" <konra...@gmail.com> >>> wrote: >>> >>>> Something strange with some of my osad clients ~1/3 >>>> >>>> They don't pickup any jobs from osa-dispatcher, no any errors during >>>> starting the service, >>>> >>>> also if I restart osad on sp I see logs: >>>> >>>> Aug 30 14:32:20 spacewalk15 jabberd/c2s[51907]: [142] >>>> [::ffff:172.90.7.220, port=43046] disconnect jid=osad-e43e3265db@ >>>> spacewalk15.ooma.internal/osad, packets: 29, bytes: 3738 >>>> Aug 30 14:32:20 spacewalk15 jabberd/sm[51904]: session ended: >>>> jid=osad-e43e326...@spacewalk15.ooma.internal/osad >>>> Aug 30 14:32:20 spacewalk15 jabberd/sm[51904]: user unloaded >>>> jid=osad-e43e326...@spacewalk15.ooma.internal >>>> Aug 30 14:32:20 spacewalk15 jabberd/c2s[51907]: [142] >>>> traditional.digest authentication succeeded: osad-e43e3265db@/osad >>>> ::ffff:172.90.7.220:43454 TLS >>>> Aug 30 14:32:20 spacewalk15 jabberd/c2s[51907]: [142] requesting >>>> session: jid=osad-e43e326...@spacewalk15.ooma.internal/osad >>>> Aug 30 14:32:20 spacewalk15 jabberd/sm[51904]: session started: >>>> jid=osad-e43e326...@spacewalk15.ooma.internal/osad >>>> >>>> So looks like everything should be fine >>>> >>>> _______________________________________________ >>>> Spacewalk-list mailing list >>>> Spacewalk-list@redhat.com >>>> https://www.redhat.com/mailman/listinfo/spacewalk-list >>>> >>> >>> _______________________________________________ >>> Spacewalk-list mailing list >>> Spacewalk-list@redhat.com >>> https://www.redhat.com/mailman/listinfo/spacewalk-list >>> >> >> _______________________________________________ >> Spacewalk-list mailing list >> Spacewalk-list@redhat.com >> https://www.redhat.com/mailman/listinfo/spacewalk-list > > > _______________________________________________ > Spacewalk-list mailing list > Spacewalk-list@redhat.com > https://www.redhat.com/mailman/listinfo/spacewalk-list >
_______________________________________________ Spacewalk-list mailing list Spacewalk-list@redhat.com https://www.redhat.com/mailman/listinfo/spacewalk-list