Hi,

On 13 Dec 2017, at 12:28 am, Garret Peirce <pei...@maine.edu> wrote:

I should've circled back/followed up as we worked through this.
We worked w/Cisco earlier this year and they had since developed 8.3.121 which 
among others IIRC included resolutions to these relevant issues.
CSCvb65706 , CSCvc74528, CSCvd07423, CSCuz47559.

Since 8.3.121.1 (and above) , our incident rate has fallen to nearly zero 
across ~9k APs,
We've also been working on them with CSCvf28459 (related to an nvram issue) for 
which the fix I hear is to be released soon.

Is the NVRAM issue the one where the AP config goes missing and the AP comes 
back with an empty config?  We see that, too (and some other more local 
institutions have hit it as well).  Can't seem to see the bug details in Bug 
Search (unexpected error occurred, please try again).

We're getting a custom engineering release cut at the moment so we'd like to 
get as many fixes (if they're available) in as possible. This'll be an MR 
escalation image on 8.3.



> On 13 Dec 2017, at 12:00 am, Jan Freerk Popma <j.f.po...@utwente.nl> wrote:
> 
> Hi all,
>  
> We also have this problem for about a year now but exclusively on 3600’s, 
> although 2600 and 3700 are not beyond suspicion, our 702, 1140, 1810, 2700’s 
> seem to be fine.
> It also looked like we were the only ones with this problem but there are 
> more.
> So get on to your supplier and Cisco that this is a serious issue and needs 
> fixing.
>  
> I seems to be at least in all 8.2 and 8.3 releases.
> We have TAC-case SR 682811103 running for this and we are currently running a 
> 8.2.166.0 based debug version testing out a possible fix.
>  
> What seems to be the case is that the flash file system gets corrupted.
> Not surprisingly when the AP needs to reboot it runs into all kind of 
> problems, like a not working boot image, not loading radio firmware or 
> corrupt config. The AP drops to boot rom or gets in to a boot loop.
> The only remedy is via the console do fsck or format of the flash and to 
> reload either the current image or the recovery image from a tftp server.
>  
> The problem is not easy to debug as there are no indications of a running AP 
> which is corrupt and the trigger is as yet unknown, it is however detectable 
> remotely.
> We have developed a script which checks the AP’s and with some hidden 
> features re-installs the image if it is corrupted.
> Of our 400+ AP3600’s there are about 10 fails a week, leave the check longer 
> and the numbers go up.
> This script catches most corrupt AP’s before they break on a reboot, it is 
> highly tailored so it won’t easily translate to a different environment and 
> of course it is not a fix.
>  
> 

Same issue here!  This sounds fairly severe - and I'm surprised I haven't heard 
more about this issue.

Keen to know how you've done this, as this looks fairly easy to implement on 
our end as well and could save us a world of pain.  We're equally as worried 
about performing an upgrade and having to send more contractors on scaffolding 
on lecture theatres over the Christmas break to replace/recover APs.

Would you be able to share the process (either on the list or privately)?

Cheers,
Tristan
-- 
TRISTAN GULYAS
Senior Network Engineer

Technology Services, eSolutions
Monash University
738 Blackburn Road
Clayton 3168
Australia

T: +61 3 9902 9092              
M: +61 (0)403 224 484
E: tristan.gul...@monash.edu <mailto:tristan.gul...@monash.edu>
monash.edu <http://monash.edu/>
> 
> 
> On Tue, Dec 12, 2017 at 8:00 AM, Jan Freerk Popma <j.f.po...@utwente.nl 
> <mailto:j.f.po...@utwente.nl>> wrote:
> Hi all,
> 
>  
> 
> We also have this problem for about a year now but exclusively on 3600’s, 
> although 2600 and 3700 are not beyond suspicion, our 702, 1140, 1810, 2700’s 
> seem to be fine.
> 
> It also looked like we were the only ones with this problem but there are 
> more.
> 
> So get on to your supplier and Cisco that this is a serious issue and needs 
> fixing.
> 
>  
> 
> I seems to be at least in all 8.2 and 8.3 releases.
> 
> We have TAC-case SR 682811103 running for this and we are currently running a 
> 8.2.166.0 based debug version testing out a possible fix.
> 
>  
> 
> What seems to be the case is that the flash file system gets corrupted.
> 
> Not surprisingly when the AP needs to reboot it runs into all kind of 
> problems, like a not working boot image, not loading radio firmware or 
> corrupt config. The AP drops to boot rom or gets in to a boot loop.
> 
> The only remedy is via the console do fsck or format of the flash and to 
> reload either the current image or the recovery image from a tftp server.
> 
>  
> 
> The problem is not easy to debug as there are no indications of a running AP 
> which is corrupt and the trigger is as yet unknown, it is however detectable 
> remotely.
> 
> We have developed a script which checks the AP’s and with some hidden 
> features re-installs the image if it is corrupted.
> 
> Of our 400+ AP3600’s there are about 10 fails a week, leave the check longer 
> and the numbers go up.
> 
> This script catches most corrupt AP’s before they break on a reboot, it is 
> highly tailored so it won’t easily translate to a different environment and 
> of course it is not a fix.
> 
>  
> 
> Regards,
> 
> Jan Freerk Popma | ICT Service Center, Networkmanagement | University of 
> Twente, Enschede, Netherlands
> 
> Building Citadel room 219 | T: +31 53 489 4321 <tel:+31%2053%20489%204321> | 
> j.f.po...@utwente.nl <mailto:j.f.po...@utwente.nl> | www.utwente.nl 
> <http://www.utwente.nl/>
>  
> 
>  
> 
> From: The EDUCAUSE Wireless Issues Constituent Group Listserv 
> [mailto:WIRELESS-LAN@LISTSERV.EDUCAUSE.EDU 
> <mailto:WIRELESS-LAN@LISTSERV.EDUCAUSE.EDU>] On Behalf Of Tristan Gulyas
> Sent: dinsdag 12 december 2017 07:18
> 
> 
> To: WIRELESS-LAN@LISTSERV.EDUCAUSE.EDU 
> <mailto:WIRELESS-LAN@LISTSERV.EDUCAUSE.EDU>
> Subject: Re: [WIRELESS-LAN] Cisco AP 'flash' bug
> 
>  
> 
> Hi all,
> 
>  
> 
> I was under the impression that we were the only customer who have been 
> hitting this. 8.3.112.7 engineering release.
> 
>  
> 
> We've seen it on all platforms - fixed in 702W in our current release (we 
> believe) but we're seeing it on 1532, 3502, 3602, 2702, 3702. Not present on 
> 3800/1562 from what we've seen.
> 
>  
> 
> One catalyst for this has been AP reboots.  Has anyone else been hit by this 
> bug or been provided with a fix?
> 
>  
> 
> Cheers,
> 
> Tristan
> 
> -- 
> 
> TRISTAN GULYAS
> 
> Senior Network Engineer
> 
>  
> 
> Technology Services, eSolutions
> 
> Monash University
> 
> 738 Blackburn Road 
> <https://maps.google.com/?q=738+Blackburn+Road%0D+%0D+%0D+Clayton+3168%0D+%0D+%0D+Australia&entry=gmail&source=g>
> Clayton 3168 
> <https://maps.google.com/?q=738+Blackburn+Road%0D+%0D+%0D+Clayton+3168%0D+%0D+%0D+Australia&entry=gmail&source=g>
> Australia 
> <https://maps.google.com/?q=738+Blackburn+Road%0D+%0D+%0D+Clayton+3168%0D+%0D+%0D+Australia&entry=gmail&source=g>
>  
> 
> T: +61 3 9902 9092 <tel:+61%203%209902%209092>              
> 
> E: tristan.gul...@monash.edu <mailto:tristan.gul...@monash.edu>
> monash.edu <http://monash.edu/>
>  
> 
> On 20 Jan 2017, at 7:46 am, McClintic, Thomas <thomas.mcclin...@uth.tmc.edu 
> <mailto:thomas.mcclin...@uth.tmc.edu>> wrote:
> 
>  
> 
> Next time you have this issue, try connecting a console to the AP and run the 
> following:
> 
>  
> 
> ap: fsck flash:
> 
> Are you sure you want to fsck "flash:" (could take some time) (y/n)?y
> 
> flashfs[0]: ……………
> 
> ap: boot
> 
> 
> 
> This works for us on the failed to reload properly APs.
> 
>  
> 
> From: The EDUCAUSE Wireless Issues Constituent Group Listserv 
> [mailto:WIRELESS-LAN@LISTSERV.EDUCAUSE.EDU 
> <mailto:WIRELESS-LAN@LISTSERV.EDUCAUSE.EDU>] On Behalf Of Garret Peirce
> Sent: Thursday, January 19, 2017 10:44 AM
> To: WIRELESS-LAN@LISTSERV.EDUCAUSE.EDU 
> <mailto:WIRELESS-LAN@LISTSERV.EDUCAUSE.EDU>
> Subject: Re: [WIRELESS-LAN] Cisco AP 'flash' bug
> 
>  
> 
> Ian, thanks for the response.
> 
> To commiserate it does feel that wireless ecosystem has been affected by a 
> larger bloom of bugs over the last year or so.
> 
> Some of that may be due to enhanced vigilance and our tracking them down to 
> root causes, but whatever the case, in aggregate it's a concern here as well.
> 
>  
> 
> Another related statistic about this issue.
> 
> With ~7000 total APs potentially affected we're seeing an incidence rate 
> below 1% which although low, it's felt more when you're making fire-fighting 
> trips to visit/replace affected APs.
> 
>  
> 
>  
> 
> On Thu, Jan 19, 2017 at 10:28 AM, Ian Lyons <ily...@rollins.edu 
> <mailto:ily...@rollins.edu>> wrote:
> 
> Yes, we own that bug too.  Pretty much we have every bug ..and have been 
> patching like madmen since July.
> 
>  
> 
> From: The EDUCAUSE Wireless Issues Constituent Group Listserv 
> [mailto:WIRELESS-LAN@LISTSERV.EDUCAUSE.EDU 
> <mailto:WIRELESS-LAN@LISTSERV.EDUCAUSE.EDU>] On Behalf Of Garret Peirce
> Sent: Thursday, January 19, 2017 10:27 AM
> To: WIRELESS-LAN@LISTSERV.EDUCAUSE.EDU 
> <mailto:WIRELESS-LAN@LISTSERV.EDUCAUSE.EDU>
> Subject: [WIRELESS-LAN] Cisco AP 'flash' bug
> 
>  
> 
> Over the last few months we've run into/discovered a Cisco bug and I was 
> curious if any in this community have been seeing it as well.
> 
>  
> 
> In a nutshell, it appears the flash is being corrupted and the AP then enters 
> a boot loop or fails to boot at all.    We are apparently seeing a failure 
> rate of roughly 10 APs per month.  My engineer's summary is below.
> 
>  
> 
> =================
> 
>  
> 
> CSCvc74528 description is below, but it fails to take into account that 
> occasionally the boot loop doesn't happen and the AP will just crash on boot, 
> or fail to boot at all. Working with them to add some things to the 
> description. 
> 
>  
> 
> "APs go into boot cycle due to corrupt image, do not download new image from 
> WLC
> 
> CSCvc74528
> 
> Description
> 
> Symptom:
> 
> APs reboot and when booting back up the image gets corrupted. The AP checks 
> the WLC and sees it has the same image in flash and does not download the WLC 
> image. The image on the AP is corrupt and therefor continuously reboots into 
> the corrupted image.
> 
>  
> 
> Conditions:
> 
> 2702I, 3602I and 3702I APs on a 8540 WLC running 8.2.141.0 or 8.3.102.0 code 
> do not download WLC code due to same image on flash.
> 
>  
> 
> Bad flash in APs
> 
>  
> 
> Workaround:
> 
> Format APs via console with new image, holds for a few reboots.
> 
>  
> 
>  
> 
>  
> 
> ********** Participation and subscription information for this EDUCAUSE 
> Constituent Group discussion list can be found at 
> http://www.educause.edu/discuss 
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.educause.edu_discuss&d=DQMFaQ&c=6vgNTiRn9_pqCD9hKx9JgXN1VapJQ8JVoF8oWH1AgfQ&r=rYfqH_8oTvcXxRxUI3x3m3Y7Nwgir7tnuoGbdZsrUM4&m=hjvPaJDEwbeTBYMagZWhbrzxuF4zzIipa26zlRB9_9c&s=AKNZ8zWwIQMNui7NUvyIO_AgKo0Th05zDb-CtWQ43X4&e=>.
> 
> ********** Participation and subscription information for this EDUCAUSE 
> Constituent Group discussion list can be found at 
> http://www.educause.edu/discuss <http://www.educause.edu/discuss>.
> 
>  
> 
>  
> 
> ********** Participation and subscription information for this EDUCAUSE 
> Constituent Group discussion list can be found at 
> http://www.educause.edu/discuss <http://www.educause.edu/discuss>.
> 
> ********** Participation and subscription information for this EDUCAUSE 
> Constituent Group discussion list can be found at 
> http://www.educause.edu/discuss <http://www.educause.edu/discuss>.
> 
> 
> 
> 
> -- 
> Garry Peirce
> Network Architect
> Networkmaine, University of Maine System US:IT
> 207-581-3539
> ********** Participation and subscription information for this EDUCAUSE 
> Constituent Group discussion list can be found at 
> http://www.educause.edu/discuss <http://www.educause.edu/discuss>.
> 


**********
Participation and subscription information for this EDUCAUSE Constituent Group 
discussion list can be found at http://www.educause.edu/discuss.

Reply via email to