Just to clarify - is that 17.4.4 or 17.3.4?

Don Sullivan
Network Administrator
Technology Services

205-726-2111<tel:+1205-726-2111> | office
dsulli...@samford.edu<mailto:dsulli...@samford.edu>
LinkedIn<http://linkedin.com/in/donaldasullivan>
www.samford.edu<http://www.samford.edu>
800 Lakeshore Drive
Birmingham, AL 
35229<https://maps.google.com/maps?q=800+Lakeshore+Drive,+Birmingham,+AL+35229,+US>

[Samford Samford University Logo]

From: The EDUCAUSE Wireless Issues Community Group Listserv 
<WIRELESS-LAN@LISTSERV.EDUCAUSE.EDU> On Behalf Of Chad Sawyer
Sent: Tuesday, September 7, 2021 09:21
To: WIRELESS-LAN@LISTSERV.EDUCAUSE.EDU
Subject: [EXTERNAL][WIRELESS-LAN] rough start of semester on 9800-80 WLCs

Just sending a heads up in case anyone else hits these.  This was our first 
semester with a full campus since moving everything over to our 9800-80 pairs.  
They've been in production for much of the past 12 months and the performance 
was fine when campus was empty.  Under load was another story.

First issue:
Code 17.3.3 has the following bugs that were causing frequent HA failovers that 
reference the wncd process.  This was resolved by upgrading to 17.4.4.
CSCvx37499- Controller reloads with the reason "Critical process wncd fault on 
rp_0_0 (rc=139)
CSCvy20300- Primary controller in HA frequently ends abnormally

Second issue:
Unfortunately these failovers also provoked one of the units to lose the 
contents of its bootflash and get stuck in rommon mode, so we had to recover it 
via the booting to USB routine.  This was also due to a 17.3.3 bug and has been 
hopefully resolved so far by upgrading to 17.4.4.
CSCvy73836- C9800-80 controller goes to rommon after multiple failovers due to 
power cycling

Third issue:
The nastiest thing though was unrelated to bugs.  It was CAPWAP timeouts that 
only occurred in busy areas of campus.  AP uptime would show months, but CAPWAP 
uptimes were constantly resetting to zero.  The logs on the AP would show the 
following message: "Going to restart CAPWAP (reason : data keepalive not 
received)"  We wasted a lot of time troubleshooting this as a connectivity 
issue between our APs and controller, but that wasn't the cause.

This problem was a result of our following Cisco's 9800 best practice 
guide<https://www.cisco.com/c/en/us/products/collateral/wireless/catalyst-9800-series-wireless-controllers/guide-c07-743627.html>,
 specifically on site tag sizing.  Although the guide says up to 500 APs can 
safely be assigned to a site tag, that was far from the truth in our 
experience.  Several TAC folks missed it and it took our rep escalating the 
issue to a senior wireless design person from Cisco to finally find it.  She 
advised breaking up our site tags so that they didn't exceed 250 APs, which 
instantly resolved the CAPWAP timeouts.


Fourth issue:
Apparently some of the 2702i APs don't handle code upgrades gracefully with the 
9800s.  Cisco made it sound like this was a common issue.  After upgrading from 
17.3.3 to 17.3.4, several 2700s on campus were showing "%CAPWAP-3-ERRORLOG: 
Certificate verification failed!" when attempting to establish CAPWAP with the 
controllers.  This was resolved by manually recovering the APs by pushing an 
image from the downloads page to them via TFTP.  Luckily we have a staff member 
who's pretty skilled at automating this type of stuff.  These were the commands:

SSH to the affected AP
enable
!
(enter password if there is one)
!
debug capwap console cli
!
archive download-sw /overwrite /force-reload tftp://(tftp server 
IP)/ap3g2-k9w8-tar.153-3.JPJ7.tar
!

The AP will automatically reload, establish capwap with the controller, 
download the proper image, reload, and re-join the controller successfully.


Chad Sawyer
Network Engineer
USF Information Technology 
www.usf.edu/it<https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.usf.edu%2Fit&data=02%7C01%7Cchadsawyer%40usf.edu%7C6e441e7f04624f2cd45f08d793905907%7C741bf7dee2e546df8d6782607df9deaa%7C0%7C0%7C637140120503636364&sdata=nTEkOyNd5xZtwhh8auTMqUdy%2BkLJty%2FLjh56xGDK1EQ%3D&reserved=0>
13220 USF Laurel Dr, MDF 2128, Tampa, FL 33620
O: 813-974-1342
E: chadsaw...@usf.edu<mailto:chadsaw...@usf.edu>
[https://www.usf.edu/images/ucm/marketing/logos/email-sigs/email-signature-bull-u-usf-preem-240x68.png]


**********
Replies to EDUCAUSE Community Group emails are sent to the entire community 
list. If you want to reply only to the person who sent the message, copy and 
paste their email address and forward the email reply. Additional participation 
and subscription information can be found at 
https://www.educause.edu/community<https://secure-web.cisco.com/1s88F2ij1xwA5hmi9jDn9IotXa1YSgyPlVAt3FsfzUgywiSMsIXZdN5XaBhcWdBrelZLdNv_EC3siwVPbMWyMjb1BOmMR4JYKKWMCqhf5Iq9jmbXJniZ-x1E8J06T0FIDKiwHPTSDtrQcpHOMUV2FEkjRItxDOHKM7fWGCemAQGc57kG2Ac-Q_M1Sf8aymFFGvjBLV-tquROOEGelLvr5a9k9NmytGsymR4qXSAyat5NxMcPboAULPrtgnyO5-ghs85Syq_nHcMbBfADqc8idGsxC-Geid8RtwlQEQHNahfG87zo--GD2_-UVGnXt0R1RAOCxlUf7lo9SqdygH0V_UA/https%3A%2F%2Fwww.educause.edu%2Fcommunity>

**********
Replies to EDUCAUSE Community Group emails are sent to the entire community 
list. If you want to reply only to the person who sent the message, copy and 
paste their email address and forward the email reply. Additional participation 
and subscription information can be found at https://www.educause.edu/community

Reply via email to