On Fri, Nov 13, 2015 at 12:36:40PM +1000, David Gwynne wrote: > > > On 13 Nov 2015, at 12:16, Ryan Freeman <r...@slipgate.org> wrote: > > > > On Tue, Nov 10, 2015 at 08:27:36PM +1000, David Gwynne wrote: > >> any joy? i mean, failure? > > > > Well I got something different. I've noticed the failures only seem to > > happen > > when my roommates arrive home. I can use my stuff remotely all day from > > work > > without a hitch, roommates come home and usually within an hr there is an > > internet complaint. > > > > Since I started using the little scripts to detect connection failure > > and down/up the iface in question, things had been pretty good simply in the > > fact that nobody could really notice before it fixed itself. > > > > Today the machine dropped to ddb>! of course i couldn't remember a damn > > thing to type :( i got trace, terribly sorry it wasn't more... > > > > ddb> trace > > extent_free(400012600c0, 0, 0, 0, 1fe0000f078, 800012fa00000000) at > > extent_free > > +0x174 > > iommu_dvmamap_unload(40001266300, 0, 4000129f080, 0, 0, 2) at > > iommu_dvmamap_unl > > oad+0x74 > > gem_rint(400014ac000, 40016ff0000, 7fff0000, e0017c48, 4000000000000000, > > 800000 > > 00) at gem_rint+0x160 > > gem_intr(400014ac000, c00ca000, 2000, 0, 0, 80000000) at gem_intr+0x154 > > intr_handler(e0017ec8, 4000117ae00, 4bca3020, 0, 800, 2) at intr_handler+0xc > > sparc_interrupt(0, 400014b0000, 80206910, 400171b7c60, 40009ec0810, 0) at > > sparc > > _interrupt+0x298 > > gem_ioctl(400014ac048, 400014ac000, 400171b7c60, 400171b7c60, 0, > > 40009b73c10) a > > t gem_ioctl+0x19c > > ifioctl(0, 80206910, 400171b7c60, 40009b73c10, 1012d74, 0) at ifioctl+0x38c > > sys_ioctl(0, 400171b7db8, 400171b7df8, 0, 0, 14b) at sys_ioctl+0x190 > > syscall(400171b7ed0, 436, bec8920888, bec892088c, 0, 0) at syscall+0x3c4 > > softtrap(3, 80206910, fffffffffffe3018, 0, 0, 1ff7fff6df8) at softtrap+0x19c > > ddb> > > that is interesting. if you're still in ddb, can you go sh panic? > > if not, not biggy.
Sadly, I am not. as it is my router, I had to reboot to get back online to send the mail. If it triggers again I will make sure I include that. > my gut feeling is our ring accounting is wonky. mpi@ and jmatthew@ have > tweaks to gem(4) for mpsafety which might fix this. ill poke them to see if > they would share. I am willing to try anything! :) I will reiterate that I am just running 5.8 stable (with mtier binpatches for errata); if it requires me to bump up to -current, no biggie :) > > dlg > > > > > > > > >> > >>> On 9 Nov 2015, at 10:40 AM, Ryan Freeman <r...@slipgate.org> wrote: > >>> > >>> On Mon, Nov 09, 2015 at 10:07:31AM +1000, David Gwynne wrote: > >>>> can you get the ifconfig output when its locked up? and a copy of what > >>>> systat mb is showing? > >>>> > >>>> cheers, > >>>> dlg > >>> > >>> Thanks David, > >>> > >>> I have setup a script to try and capture this immediately when it happens. > >>> > >>> FWIW here is the output as it is now, working: > >>> > >>> 16:35 ryan@void:~$ ifconfig > >>> lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 32768 > >>> priority: 0 > >>> groups: lo > >>> inet6 fe80::1%lo0 prefixlen 64 scopeid 0x4 > >>> inet6 ::1 prefixlen 128 > >>> inet 127.0.0.1 netmask 0xff000000 > >>> gem0: flags=8867<UP,BROADCAST,DEBUG,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST> > >>> mtu 1500 > >>> lladdr 00:03:ba:2b:47:70 > >>> priority: 0 > >>> groups: egress > >>> media: Ethernet autoselect (100baseTX full-duplex) > >>> status: active > >>> inet 96.54.13.103 netmask 0xfffffc00 broadcast 96.54.15.255 > >>> gem1: > >>> flags=8b63<UP,BROADCAST,NOTRAILERS,RUNNING,PROMISC,ALLMULTI,SIMPLEX,MULTICAST> > >>> mtu 1500 > >>> lladdr 00:03:ba:2b:47:71 > >>> priority: 0 > >>> media: Ethernet autoselect (100baseTX full-duplex) > >>> status: active > >>> inet 10.16.1.30 netmask 0xffffffe0 broadcast 10.16.1.31 > >>> inet6 fe80::203:baff:fe2b:4771%gem1 prefixlen 64 scopeid 0x2 > >>> inet6 2001:470:b:6cf::1 prefixlen 64 > >>> enc0: flags=0<> > >>> priority: 0 > >>> groups: enc > >>> status: active > >>> vlan100: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500 > >>> lladdr 00:03:ba:2b:47:71 > >>> description: servers > >>> priority: 0 > >>> vlan: 100 parent interface: gem1 > >>> groups: vlan > >>> status: active > >>> inet 10.21.1.30 netmask 0xffffffe0 broadcast 10.21.1.31 > >>> inet6 fe80::203:baff:fe2b:4771%vlan100 prefixlen 64 scopeid 0x5 > >>> inet6 2001:470:eac8:666::1 prefixlen 64 > >>> vlan101: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500 > >>> lladdr 00:03:ba:2b:47:71 > >>> description: workstations > >>> priority: 0 > >>> vlan: 101 parent interface: gem1 > >>> groups: vlan > >>> status: active > >>> inet 10.21.8.254 netmask 0xffffff80 broadcast 10.21.8.255 > >>> inet6 fe80::203:baff:fe2b:4771%vlan101 prefixlen 64 scopeid 0x6 > >>> inet6 2001:470:eac8:a::1 prefixlen 64 > >>> vlan102: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500 > >>> lladdr 00:03:ba:2b:47:71 > >>> description: wireless > >>> priority: 0 > >>> vlan: 102 parent interface: gem1 > >>> groups: vlan > >>> status: active > >>> inet 10.21.9.254 netmask 0xffffff80 broadcast 10.21.9.255 > >>> inet6 fe80::203:baff:fe2b:4771%vlan102 prefixlen 64 scopeid 0x7 > >>> inet6 2001:470:eac8:b::1 prefixlen 64 > >>> vlan2: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500 > >>> lladdr 00:03:ba:2b:47:71 > >>> description: transit > >>> priority: 0 > >>> vlan: 2 parent interface: gem1 > >>> groups: vlan > >>> status: active > >>> inet 172.21.1.2 netmask 0xfffffffc broadcast 172.21.1.3 > >>> tun0: flags=51<UP,POINTOPOINT,RUNNING> mtu 1500 > >>> priority: 0 > >>> groups: tun > >>> status: down > >>> inet 10.21.2.1 --> 10.21.2.2 netmask 0xfffffffc > >>> gif0: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> mtu 1280 > >>> priority: 0 > >>> groups: gif egress > >>> tunnel: inet 96.54.13.103 -> 216.218.226.238 > >>> inet6 fe80::203:baff:fe2b:4770%gif0 -> prefixlen 64 scopeid 0xa > >>> inet6 2001:470:a:6cf::2 -> 2001:470:a:6cf::1 prefixlen 128 > >>> pflow0: flags=41<UP,RUNNING> mtu 1492 > >>> priority: 0 > >>> pflow: sender: 127.0.0.1 receiver: 127.0.0.1:9995 version: 5 > >>> groups: pflow > >>> pflog0: flags=141<UP,RUNNING,PROMISC> mtu 33144 > >>> priority: 0 > >>> groups: pflog > >>> > >>> 16:36 ryan@void:~$ systat -b mb > >>> 8 users Load 0.21 0.25 0.26 Sun Nov 8 16:37:12 > >>> 2015 > >>> > >>> IFACE LIVELOCKS SIZE ALIVE LWM HWM CWM > >>> > >>> System 0 256 48 129 > >>> > >>> 2048 24 1025 > >>> > >>> lo0 > >>> > >>> gem0 2048 11 4 124 11 > >>> > >>> gem1 2048 12 4 124 12 > >>> > >>> enc0 > >>> > >>> vlan100 > >>> > >>> vlan101 > >>> > >>> vlan102 > >>> > >>> vlan2 > >>> > >>> tun0 > >>> > >>> gif0 > >>> > >>> pflow0 > >>> > >>> pflog0 > >>> > >>>> > >>>>> On 9 Nov 2015, at 09:36, Ryan Freeman <r...@slipgate.org> wrote: > >>>>> > >>>>> Hey tech@, > >>>>> > >>>>> At my wits end here, I recently got a sunfire v120 from work for pretty > >>>>> cheap. > >>>>> Quite excited to have some non x86 hardware, I set it up as a router. > >>>>> > >>>>> However, for some reason after sometimes mere hours -- othertimes days > >>>>> at a > >>>>> time, the gem0 interface needs to be cycled: > >>>>> > >>>>> ifconfig gem0 down > >>>>> ifconfig gem0 up > >>>>> dhclient gem0 > >>>>> > >>>>> no packets pass until that has been done. At first I have been > >>>>> placing the > >>>>> blame squarely on the Hitron modem we have in the house from shaw cable, > >>>>> but now I've noticed the issue happen twice on the internal interface > >>>>> as well, > >>>>> gem1. All VLANs I have setup stop responding until gem1 is cycled. > >>>>> > >>>>> gem1 is just used by a collection of vlan(4) interfaces, so traffic > >>>>> resumes > >>>>> immediately after interface gem1 down/up. > >>>>> > >>>>> I've tried to turn on ifconfig gem0 debug to catch anything wierd, but > >>>>> there > >>>>> has been nothing of interest there. Dmesg attached, starting to > >>>>> wonder > >>>>> if this machine is at its EOL and the network ports are dying :( > >>>>> > >>>>> This issue occurred with the 5.7 release as well. > >>>>> > >>>>> dmesg: > >>>>> console is /pci@1f,0/pci@1,1/isa@7/serial@0,3f8 > >>>>> Copyright (c) 1982, 1986, 1989, 1991, 1993 > >>>>> The Regents of the University of California. All rights reserved. > >>>>> Copyright (c) 1995-2015 OpenBSD. All rights reserved. > >>>>> http://www.OpenBSD.org > >>>>> > >>>>> OpenBSD 5.8 (GENERIC) #0: Thu Oct 22 00:24:09 PDT 2015 > >>>>> r...@void.inter.lan:/usr/src/sys/arch/sparc64/compile/GENERIC > >>>>> real mem = 1073741824 (1024MB) > >>>>> avail mem = 1039228928 (991MB) > >>>>> mpath0 at root > >>>>> scsibus0 at mpath0: 256 targets > >>>>> mainbus0 at root: Sun Fire V120 (UltraSPARC-IIe 648MHz) > >>>>> cpu0 at mainbus0: SUNW,UltraSPARC-IIe (rev 3.3) @ 648 MHz > >>>>> cpu0: physical 16K instruction (32 b/l), 16K data (32 b/l), 512K > >>>>> external (64 b/l) > >>>>> psycho0 at mainbus0: SUNW,sabre, impl 0, version 0, ign 7c0 > >>>>> psycho0: bus range 0-2, PCI bus 0 > >>>>> psycho0: dvma map c0000000-dfffffff > >>>>> pci0 at psycho0 > >>>>> ppb0 at pci0 dev 1 function 1 "Sun Simba" rev 0x13 > >>>>> pci1 at ppb0 bus 1 > >>>>> ebus0 at pci1 dev 12 function 0 "Sun RIO EBus" rev 0x01 > >>>>> "flashprom" at ebus0 addr 0-fffff not configured > >>>>> clock1 at ebus0 addr 0-1fff: mk48t59 > >>>>> lom0 at ebus0 addr 200000-200003 ivec 0x2a: LOMlite2 rev 3.12 > >>>>> alipm0 at pci1 dev 3 function 0 "Acer Labs M7101 Power" rev 0x00: 74KHz > >>>>> clock > >>>>> iic0 at alipm0 > >>>>> "max1617" at alipm0 addr 0x18 skipped due to alipm0 bugs > >>>>> spdmem0 at iic0 addr 0x54: 512MB SDRAM registered ECC PC133CL2 > >>>>> spdmem1 at iic0 addr 0x55: 512MB SDRAM registered ECC PC133CL2 > >>>>> ebus1 at pci1 dev 7 function 0 "Acer Labs M1533 ISA" rev 0x00 > >>>>> power0 at ebus1 addr 2000-2007 ivec 0x25 > >>>>> com0 at ebus1 addr 3f8-3ff ivec 0x2b: ns16550a, 16 byte fifo > >>>>> com0: console > >>>>> com1 at ebus1 addr 2e8-2ef ivec 0x2b: ns16550a, 16 byte fifo > >>>>> gem0 at pci1 dev 12 function 1 "Sun ERI Ether" rev 0x01: ivec 0x7c6, > >>>>> address 00:03:ba:2b:47:70 > >>>>> ukphy0 at gem0 phy 1: Generic IEEE 802.3u media interface, rev. 1: OUI > >>>>> 0x0010dd, model 0x0002 > >>>>> ohci0 at pci1 dev 12 function 3 "Sun USB" rev 0x01: ivec 0x7e4, version > >>>>> 1.0, legacy support > >>>>> pciide0 at pci1 dev 13 function 0 "Acer Labs M5229 UDMA IDE" rev 0xc3: > >>>>> DMA, channel 0 configured to native-PCI, channel 1 configured to > >>>>> native-PCI > >>>>> pciide0: using ivec 0x7cc for native-PCI interrupt > >>>>> atapiscsi0 at pciide0 channel 0 drive 0 > >>>>> scsibus1 at atapiscsi0: 2 targets > >>>>> cd0 at scsibus1 targ 0 lun 0: <TEAC, CD-224E, P.9A> ATAPI 5/cdrom > >>>>> removable > >>>>> cd0(pciide0:0:0): using PIO mode 4, DMA mode 2 > >>>>> pciide0: channel 1 disabled (no drives) > >>>>> gem1 at pci1 dev 5 function 1 "Sun ERI Ether" rev 0x01: ivec 0x7dc, > >>>>> address 00:03:ba:2b:47:71 > >>>>> ukphy1 at gem1 phy 1: Generic IEEE 802.3u media interface, rev. 1: OUI > >>>>> 0x0010dd, model 0x0002 > >>>>> ohci1 at pci1 dev 5 function 3 "Sun USB" rev 0x01: ivec 0x7e6, version > >>>>> 1.0, legacy support > >>>>> usb0 at ohci0: USB revision 1.0 > >>>>> uhub0 at usb0 "Sun OHCI root hub" rev 1.00/1.00 addr 1 > >>>>> usb1 at ohci1: USB revision 1.0 > >>>>> uhub1 at usb1 "Sun OHCI root hub" rev 1.00/1.00 addr 1 > >>>>> ppb1 at pci0 dev 1 function 0 "Sun Simba" rev 0x13 > >>>>> pci2 at ppb1 bus 2 > >>>>> siop0 at pci2 dev 8 function 0 "Symbios Logic 53c896" rev 0x07: ivec > >>>>> 0x7e0, using 8K of on-board RAM > >>>>> scsibus2 at siop0: 16 targets, initiator 7 > >>>>> sym0 at scsibus2 targ 0 lun 0: <SEAGATE, ST336607LSUN36G, 0207> SCSI3 > >>>>> 0/direct fixed serial.SEAGATE_ST336607LSUN36G_3JA0DGN80000731804D9 > >>>>> sd0 at scsibus0 targ 0 lun 0: <SEAGATE, ST336607LSUN36G, 0207> SCSI3 > >>>>> 0/direct fixed serial.SEAGATE_ST336607LSUN36G_3JA0DGN80000731804D9 > >>>>> sd0: 34732MB, 512 bytes/sector, 71132959 sectors > >>>>> probe(siop0:1:0): Check Condition (error 0x70) on opcode 0x0 > >>>>> SENSE KEY: Hardware Error > >>>>> ASC/ASCQ: Defect List Error > >>>>> FRU CODE: 0x7 > >>>>> sym1 at scsibus2 targ 1 lun 0: <SEAGATE, ST336607LSUN36G, 0207> SCSI3 > >>>>> 0/direct fixed serial.SEAGATE_ST336607LSUN36G_3JA0BZL100002316NCUL > >>>>> sd1 at scsibus0 targ 1 lun 0: <SEAGATE, ST336607LSUN36G, 0207> SCSI3 > >>>>> 0/direct fixed serial.SEAGATE_ST336607LSUN36G_3JA0BZL100002316NCUL > >>>>> siop1 at pci2 dev 8 function 1 "Symbios Logic 53c896" rev 0x07: ivec > >>>>> 0x7e0, using 8K of on-board RAM > >>>>> scsibus3 at siop1: 16 targets, initiator 7 > >>>>> siop0: target 0 now using tagged 16 bit 40.0 MHz 31 REQ/ACK offset xfers > >>>>> vscsi0 at root > >>>>> scsibus4 at vscsi0: 256 targets > >>>>> softraid0 at root > >>>>> scsibus5 at softraid0: 256 targets > >>>>> siop0: target 1 now using tagged 16 bit 40.0 MHz 31 REQ/ACK offset xfers > >>>>> bootpath: /pci@1f,0/pci@1,0/scsi@8,0/disk@0,0 > >>>>> root on sd0a (dd2dc38974492ea6.a) swap on sd0b dump on sd0b > >>>>> > >>>> > >> >