Just to clarify - is that 17.4.4 or 17.3.4? Don Sullivan Network Administrator Technology Services
205-726-2111<tel:+1205-726-2111> | office dsulli...@samford.edu<mailto:dsulli...@samford.edu> LinkedIn<http://linkedin.com/in/donaldasullivan> www.samford.edu<http://www.samford.edu> 800 Lakeshore Drive Birmingham, AL 35229<https://maps.google.com/maps?q=800+Lakeshore+Drive,+Birmingham,+AL+35229,+US> [Samford Samford University Logo] From: The EDUCAUSE Wireless Issues Community Group Listserv <WIRELESS-LAN@LISTSERV.EDUCAUSE.EDU> On Behalf Of Chad Sawyer Sent: Tuesday, September 7, 2021 09:21 To: WIRELESS-LAN@LISTSERV.EDUCAUSE.EDU Subject: [EXTERNAL][WIRELESS-LAN] rough start of semester on 9800-80 WLCs Just sending a heads up in case anyone else hits these. This was our first semester with a full campus since moving everything over to our 9800-80 pairs. They've been in production for much of the past 12 months and the performance was fine when campus was empty. Under load was another story. First issue: Code 17.3.3 has the following bugs that were causing frequent HA failovers that reference the wncd process. This was resolved by upgrading to 17.4.4. CSCvx37499- Controller reloads with the reason "Critical process wncd fault on rp_0_0 (rc=139) CSCvy20300- Primary controller in HA frequently ends abnormally Second issue: Unfortunately these failovers also provoked one of the units to lose the contents of its bootflash and get stuck in rommon mode, so we had to recover it via the booting to USB routine. This was also due to a 17.3.3 bug and has been hopefully resolved so far by upgrading to 17.4.4. CSCvy73836- C9800-80 controller goes to rommon after multiple failovers due to power cycling Third issue: The nastiest thing though was unrelated to bugs. It was CAPWAP timeouts that only occurred in busy areas of campus. AP uptime would show months, but CAPWAP uptimes were constantly resetting to zero. The logs on the AP would show the following message: "Going to restart CAPWAP (reason : data keepalive not received)" We wasted a lot of time troubleshooting this as a connectivity issue between our APs and controller, but that wasn't the cause. This problem was a result of our following Cisco's 9800 best practice guide<https://www.cisco.com/c/en/us/products/collateral/wireless/catalyst-9800-series-wireless-controllers/guide-c07-743627.html>, specifically on site tag sizing. Although the guide says up to 500 APs can safely be assigned to a site tag, that was far from the truth in our experience. Several TAC folks missed it and it took our rep escalating the issue to a senior wireless design person from Cisco to finally find it. She advised breaking up our site tags so that they didn't exceed 250 APs, which instantly resolved the CAPWAP timeouts. Fourth issue: Apparently some of the 2702i APs don't handle code upgrades gracefully with the 9800s. Cisco made it sound like this was a common issue. After upgrading from 17.3.3 to 17.3.4, several 2700s on campus were showing "%CAPWAP-3-ERRORLOG: Certificate verification failed!" when attempting to establish CAPWAP with the controllers. This was resolved by manually recovering the APs by pushing an image from the downloads page to them via TFTP. Luckily we have a staff member who's pretty skilled at automating this type of stuff. These were the commands: SSH to the affected AP enable ! (enter password if there is one) ! debug capwap console cli ! archive download-sw /overwrite /force-reload tftp://(tftp server IP)/ap3g2-k9w8-tar.153-3.JPJ7.tar ! The AP will automatically reload, establish capwap with the controller, download the proper image, reload, and re-join the controller successfully. Chad Sawyer Network Engineer USF Information Technology www.usf.edu/it<https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.usf.edu%2Fit&data=02%7C01%7Cchadsawyer%40usf.edu%7C6e441e7f04624f2cd45f08d793905907%7C741bf7dee2e546df8d6782607df9deaa%7C0%7C0%7C637140120503636364&sdata=nTEkOyNd5xZtwhh8auTMqUdy%2BkLJty%2FLjh56xGDK1EQ%3D&reserved=0> 13220 USF Laurel Dr, MDF 2128, Tampa, FL 33620 O: 813-974-1342 E: chadsaw...@usf.edu<mailto:chadsaw...@usf.edu> [https://www.usf.edu/images/ucm/marketing/logos/email-sigs/email-signature-bull-u-usf-preem-240x68.png] ********** Replies to EDUCAUSE Community Group emails are sent to the entire community list. If you want to reply only to the person who sent the message, copy and paste their email address and forward the email reply. Additional participation and subscription information can be found at https://www.educause.edu/community<https://secure-web.cisco.com/1s88F2ij1xwA5hmi9jDn9IotXa1YSgyPlVAt3FsfzUgywiSMsIXZdN5XaBhcWdBrelZLdNv_EC3siwVPbMWyMjb1BOmMR4JYKKWMCqhf5Iq9jmbXJniZ-x1E8J06T0FIDKiwHPTSDtrQcpHOMUV2FEkjRItxDOHKM7fWGCemAQGc57kG2Ac-Q_M1Sf8aymFFGvjBLV-tquROOEGelLvr5a9k9NmytGsymR4qXSAyat5NxMcPboAULPrtgnyO5-ghs85Syq_nHcMbBfADqc8idGsxC-Geid8RtwlQEQHNahfG87zo--GD2_-UVGnXt0R1RAOCxlUf7lo9SqdygH0V_UA/https%3A%2F%2Fwww.educause.edu%2Fcommunity> ********** Replies to EDUCAUSE Community Group emails are sent to the entire community list. If you want to reply only to the person who sent the message, copy and paste their email address and forward the email reply. Additional participation and subscription information can be found at https://www.educause.edu/community