Thanks for the info.  I'll look into the AP service pack.  We haven't done one 
of those yet so kind of curious to see it in action.

Yeah we had mixed results with the 500 APs in a site tag.  Some of the areas on 
campus were fine.  I think client count had something to do with provoking it.  
Our highest population areas were the ones that saw the most capwap timeouts.  
Just curious- how are you checking the number of APs and site tags assigned to 
a wncd process?

From: The EDUCAUSE Wireless Issues Community Group Listserv 
<WIRELESS-LAN@LISTSERV.EDUCAUSE.EDU> On Behalf Of Rios, Hector J
Sent: Tuesday, September 7, 2021 12:57 PM
To: WIRELESS-LAN@LISTSERV.EDUCAUSE.EDU
Subject: Re: [WIRELESS-LAN] rough start of semester on 9800-80 WLCs

Chad,

Sorry to hear about the issues you ran into. We also started the semester with 
9800-80s, but we chose to go with 17.3.4.

Things went well for most of the day on the first day of classes, except for a 
single controller crash after business hours. Cisco has identified this as a 
bug on the 17.3.X:
CSCvx71141 - CPU HOG in RRM Process.

You should contact TAC to get more details. They might also be able to provide 
a workaround, depending on your configuration.

We also ran into the bug below, but this was fixed with an AP service pack. 
Cool feature BTW, it actually works.
CSCvz08781
Symptom: AP2800/3800/4800/1560/IW6300/ESW6300 Firmware Radio Crash on 17.3.4 
while passing client traffic.

There is also an issue on 17.3.4 that is impacting 9120s. Cisco is working on a 
service pack for this as well. Don't have more details on this.

Thank you on the information regarding the wncd processes. We also followed the 
best practices, but we do have controllers that have a few wncd processes with 
a little over 500 APs. No issues so far, other than we have noticed in a few 
instances that even though we only have 8 custom site tags, some WLCs will 
assign two sitetags to a single wncd process.  We are working with TAC on this.

We also have a substantial number of 2700 series AP. We encountered no major 
issues during the upgrade process.

Finally, we have noticed that L3 roaming is not working on our 802.1X and PSK 
SSIDs. I wonder if anyone has run into this issue as well?.

Best,

Hector Rios
UT Austin




From: The EDUCAUSE Wireless Issues Community Group Listserv 
<WIRELESS-LAN@LISTSERV.EDUCAUSE.EDU<mailto:WIRELESS-LAN@LISTSERV.EDUCAUSE.EDU>> 
On Behalf Of Chad Sawyer
Sent: Tuesday, September 7, 2021 9:21 AM
To: 
WIRELESS-LAN@LISTSERV.EDUCAUSE.EDU<mailto:WIRELESS-LAN@LISTSERV.EDUCAUSE.EDU>
Subject: [WIRELESS-LAN] rough start of semester on 9800-80 WLCs

Just sending a heads up in case anyone else hits these.  This was our first 
semester with a full campus since moving everything over to our 9800-80 pairs.  
They've been in production for much of the past 12 months and the performance 
was fine when campus was empty.  Under load was another story.

First issue:
Code 17.3.3 has the following bugs that were causing frequent HA failovers that 
reference the wncd process.  This was resolved by upgrading to 17.4.4.
CSCvx37499- Controller reloads with the reason "Critical process wncd fault on 
rp_0_0 (rc=139)
CSCvy20300- Primary controller in HA frequently ends abnormally

Second issue:
Unfortunately these failovers also provoked one of the units to lose the 
contents of its bootflash and get stuck in rommon mode, so we had to recover it 
via the booting to USB routine.  This was also due to a 17.3.3 bug and has been 
hopefully resolved so far by upgrading to 17.4.4.
CSCvy73836- C9800-80 controller goes to rommon after multiple failovers due to 
power cycling

Third issue:
The nastiest thing though was unrelated to bugs.  It was CAPWAP timeouts that 
only occurred in busy areas of campus.  AP uptime would show months, but CAPWAP 
uptimes were constantly resetting to zero.  The logs on the AP would show the 
following message: "Going to restart CAPWAP (reason : data keepalive not 
received)"  We wasted a lot of time troubleshooting this as a connectivity 
issue between our APs and controller, but that wasn't the cause.

This problem was a result of our following Cisco's 9800 best practice 
guide<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.cisco.com%2Fc%2Fen%2Fus%2Fproducts%2Fcollateral%2Fwireless%2Fcatalyst-9800-series-wireless-controllers%2Fguide-c07-743627.html&data=04%7C01%7Cchadsawyer%40USF.EDU%7C70dd56e07e8d48c6afc308d9722075bf%7C741bf7dee2e546df8d6782607df9deaa%7C0%7C0%7C637666306018194796%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=RsIhHz292zPCRhyIKigNhDH7PCd%2FGWGgI7UFSSB1OYE%3D&reserved=0>,
 specifically on site tag sizing.  Although the guide says up to 500 APs can 
safely be assigned to a site tag, that was far from the truth in our 
experience.  Several TAC folks missed it and it took our rep escalating the 
issue to a senior wireless design person from Cisco to finally find it.  She 
advised breaking up our site tags so that they didn't exceed 250 APs, which 
instantly resolved the CAPWAP timeouts.


Fourth issue:
Apparently some of the 2702i APs don't handle code upgrades gracefully with the 
9800s.  Cisco made it sound like this was a common issue.  After upgrading from 
17.3.3 to 17.3.4, several 2700s on campus were showing "%CAPWAP-3-ERRORLOG: 
Certificate verification failed!" when attempting to establish CAPWAP with the 
controllers.  This was resolved by manually recovering the APs by pushing an 
image from the downloads page to them via TFTP.  Luckily we have a staff member 
who's pretty skilled at automating this type of stuff.  These were the commands:

SSH to the affected AP
enable
!
(enter password if there is one)
!
debug capwap console cli
!
archive download-sw /overwrite /force-reload tftp://(tftp server 
IP)/ap3g2-k9w8-tar.153-3.JPJ7.tar
!

The AP will automatically reload, establish capwap with the controller, 
download the proper image, reload, and re-join the controller successfully.


Chad Sawyer
Network Engineer
USF Information Technology 
www.usf.edu/it<https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.usf.edu%2Fit&data=04%7C01%7Cchadsawyer%40USF.EDU%7C70dd56e07e8d48c6afc308d9722075bf%7C741bf7dee2e546df8d6782607df9deaa%7C0%7C0%7C637666306018204743%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=RAuZ1J9bS4kopKPJeBWpFuaBAlkghDHgI%2BChHyqBzaA%3D&reserved=0>
13220 USF Laurel Dr, MDF 2128, Tampa, FL 33620
O: 813-974-1342
E: chadsaw...@usf.edu<mailto:chadsaw...@usf.edu>
[https://www.usf.edu/images/ucm/marketing/logos/email-sigs/email-signature-bull-u-usf-preem-240x68.png]


This message is from an external sender. Learn more about why this 
matters.<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fut.service-now.com%2Fsp%3Fid%3Dkb_article%26number%3DKB0011401&data=04%7C01%7Cchadsawyer%40USF.EDU%7C70dd56e07e8d48c6afc308d9722075bf%7C741bf7dee2e546df8d6782607df9deaa%7C0%7C0%7C637666306018204743%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=nd9NpCrqRRZ9LsOjZXxiFu5a3%2ByGkybx3GfIJH4%2BHaU%3D&reserved=0>


**********
Replies to EDUCAUSE Community Group emails are sent to the entire community 
list. If you want to reply only to the person who sent the message, copy and 
paste their email address and forward the email reply. Additional participation 
and subscription information can be found at 
https://www.educause.edu/community<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.educause.edu%2Fcommunity&data=04%7C01%7Cchadsawyer%40USF.EDU%7C70dd56e07e8d48c6afc308d9722075bf%7C741bf7dee2e546df8d6782607df9deaa%7C0%7C0%7C637666306018214699%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=cAVRXNJ2WDOd66jS7EckuQfVqgu3xoZms3rdCWv6jDA%3D&reserved=0>

**********
Replies to EDUCAUSE Community Group emails are sent to the entire community 
list. If you want to reply only to the person who sent the message, copy and 
paste their email address and forward the email reply. Additional participation 
and subscription information can be found at 
https://www.educause.edu/community<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.educause.edu%2Fcommunity&data=04%7C01%7Cchadsawyer%40USF.EDU%7C70dd56e07e8d48c6afc308d9722075bf%7C741bf7dee2e546df8d6782607df9deaa%7C0%7C0%7C637666306018214699%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=cAVRXNJ2WDOd66jS7EckuQfVqgu3xoZms3rdCWv6jDA%3D&reserved=0>

[EXTERNAL EMAIL] DO NOT CLICK links or attachments unless you recognize the 
sender and know the content is safe.

**********
Replies to EDUCAUSE Community Group emails are sent to the entire community 
list. If you want to reply only to the person who sent the message, copy and 
paste their email address and forward the email reply. Additional participation 
and subscription information can be found at https://www.educause.edu/community

Reply via email to