> On 13 Nov 2015, at 12:16, Ryan Freeman <r...@slipgate.org> wrote:
> 
> On Tue, Nov 10, 2015 at 08:27:36PM +1000, David Gwynne wrote:
>> any joy? i mean, failure?
> 
> Well I got something different.  I've noticed the failures only seem to happen
> when my roommates arrive home.  I can use my stuff remotely all day from work
> without a hitch, roommates come home and usually within an hr there is an
> internet complaint.
> 
> Since I started using the little scripts to detect connection failure
> and down/up the iface in question, things had been pretty good simply in the
> fact that nobody could really notice before it fixed itself.
> 
> Today the machine dropped to ddb>!  of course i couldn't remember a damn
> thing to type :(  i got trace, terribly sorry it wasn't more...
> 
> ddb> trace
> extent_free(400012600c0, 0, 0, 0, 1fe0000f078, 800012fa00000000) at 
> extent_free
> +0x174
> iommu_dvmamap_unload(40001266300, 0, 4000129f080, 0, 0, 2) at 
> iommu_dvmamap_unl
> oad+0x74
> gem_rint(400014ac000, 40016ff0000, 7fff0000, e0017c48, 4000000000000000, 
> 800000
> 00) at gem_rint+0x160
> gem_intr(400014ac000, c00ca000, 2000, 0, 0, 80000000) at gem_intr+0x154
> intr_handler(e0017ec8, 4000117ae00, 4bca3020, 0, 800, 2) at intr_handler+0xc
> sparc_interrupt(0, 400014b0000, 80206910, 400171b7c60, 40009ec0810, 0) at 
> sparc
> _interrupt+0x298
> gem_ioctl(400014ac048, 400014ac000, 400171b7c60, 400171b7c60, 0, 40009b73c10) 
> a
> t gem_ioctl+0x19c
> ifioctl(0, 80206910, 400171b7c60, 40009b73c10, 1012d74, 0) at ifioctl+0x38c
> sys_ioctl(0, 400171b7db8, 400171b7df8, 0, 0, 14b) at sys_ioctl+0x190
> syscall(400171b7ed0, 436, bec8920888, bec892088c, 0, 0) at syscall+0x3c4
> softtrap(3, 80206910, fffffffffffe3018, 0, 0, 1ff7fff6df8) at softtrap+0x19c
> ddb>

that is interesting. if you're still in ddb, can you go sh panic?

if not, not biggy.

my gut feeling is our ring accounting is wonky. mpi@ and jmatthew@ have tweaks 
to gem(4) for mpsafety which might fix this. ill poke them to see if they would 
share.

dlg

> 
> 
> 
>> 
>>> On 9 Nov 2015, at 10:40 AM, Ryan Freeman <r...@slipgate.org> wrote:
>>> 
>>> On Mon, Nov 09, 2015 at 10:07:31AM +1000, David Gwynne wrote:
>>>> can you get the ifconfig output when its locked up? and a copy of what 
>>>> systat mb is showing?
>>>> 
>>>> cheers,
>>>> dlg
>>> 
>>> Thanks David,
>>> 
>>> I have setup a script to try and capture this immediately when it happens.
>>> 
>>> FWIW here is the output as it is now, working:
>>> 
>>> 16:35 ryan@void:~$ ifconfig
>>> lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 32768
>>>       priority: 0
>>>       groups: lo
>>>       inet6 fe80::1%lo0 prefixlen 64 scopeid 0x4
>>>       inet6 ::1 prefixlen 128
>>>       inet 127.0.0.1 netmask 0xff000000
>>> gem0: flags=8867<UP,BROADCAST,DEBUG,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST> 
>>> mtu 1500
>>>       lladdr 00:03:ba:2b:47:70
>>>       priority: 0
>>>       groups: egress
>>>       media: Ethernet autoselect (100baseTX full-duplex)
>>>       status: active
>>>       inet 96.54.13.103 netmask 0xfffffc00 broadcast 96.54.15.255
>>> gem1: 
>>> flags=8b63<UP,BROADCAST,NOTRAILERS,RUNNING,PROMISC,ALLMULTI,SIMPLEX,MULTICAST>
>>>  mtu 1500
>>>       lladdr 00:03:ba:2b:47:71
>>>       priority: 0
>>>       media: Ethernet autoselect (100baseTX full-duplex)
>>>       status: active
>>>       inet 10.16.1.30 netmask 0xffffffe0 broadcast 10.16.1.31
>>>       inet6 fe80::203:baff:fe2b:4771%gem1 prefixlen 64 scopeid 0x2
>>>       inet6 2001:470:b:6cf::1 prefixlen 64
>>> enc0: flags=0<>
>>>       priority: 0
>>>       groups: enc
>>>       status: active
>>> vlan100: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
>>>       lladdr 00:03:ba:2b:47:71
>>>       description: servers
>>>       priority: 0
>>>       vlan: 100 parent interface: gem1
>>>       groups: vlan
>>>       status: active
>>>       inet 10.21.1.30 netmask 0xffffffe0 broadcast 10.21.1.31
>>>       inet6 fe80::203:baff:fe2b:4771%vlan100 prefixlen 64 scopeid 0x5
>>>       inet6 2001:470:eac8:666::1 prefixlen 64
>>> vlan101: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
>>>       lladdr 00:03:ba:2b:47:71
>>>       description: workstations
>>>       priority: 0
>>>       vlan: 101 parent interface: gem1
>>>       groups: vlan
>>>       status: active
>>>       inet 10.21.8.254 netmask 0xffffff80 broadcast 10.21.8.255
>>>       inet6 fe80::203:baff:fe2b:4771%vlan101 prefixlen 64 scopeid 0x6
>>>       inet6 2001:470:eac8:a::1 prefixlen 64
>>> vlan102: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
>>>       lladdr 00:03:ba:2b:47:71
>>>       description: wireless
>>>       priority: 0
>>>       vlan: 102 parent interface: gem1
>>>       groups: vlan
>>>       status: active
>>>       inet 10.21.9.254 netmask 0xffffff80 broadcast 10.21.9.255
>>>       inet6 fe80::203:baff:fe2b:4771%vlan102 prefixlen 64 scopeid 0x7
>>>       inet6 2001:470:eac8:b::1 prefixlen 64
>>> vlan2: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
>>>       lladdr 00:03:ba:2b:47:71
>>>       description: transit
>>>       priority: 0
>>>       vlan: 2 parent interface: gem1
>>>       groups: vlan
>>>       status: active
>>>       inet 172.21.1.2 netmask 0xfffffffc broadcast 172.21.1.3
>>> tun0: flags=51<UP,POINTOPOINT,RUNNING> mtu 1500
>>>       priority: 0
>>>       groups: tun
>>>       status: down
>>>       inet 10.21.2.1 --> 10.21.2.2 netmask 0xfffffffc
>>> gif0: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> mtu 1280
>>>       priority: 0
>>>       groups: gif egress
>>>       tunnel: inet 96.54.13.103 -> 216.218.226.238
>>>       inet6 fe80::203:baff:fe2b:4770%gif0 ->  prefixlen 64 scopeid 0xa
>>>       inet6 2001:470:a:6cf::2 -> 2001:470:a:6cf::1 prefixlen 128
>>> pflow0: flags=41<UP,RUNNING> mtu 1492
>>>       priority: 0
>>>       pflow: sender: 127.0.0.1 receiver: 127.0.0.1:9995 version: 5
>>>       groups: pflow
>>> pflog0: flags=141<UP,RUNNING,PROMISC> mtu 33144
>>>       priority: 0
>>>       groups: pflog
>>> 
>>> 16:36 ryan@void:~$ systat -b mb
>>>   8 users    Load 0.21 0.25 0.26                     Sun Nov  8 16:37:12 
>>> 2015
>>> 
>>> IFACE             LIVELOCKS  SIZE ALIVE   LWM   HWM   CWM                   
>>>     
>>> System                    0   256    48         129                         
>>>     
>>>                            2048    24        1025                           
>>>   
>>> lo0                                                                         
>>>     
>>> gem0                         2048    11     4   124    11                   
>>>     
>>> gem1                         2048    12     4   124    12                   
>>>     
>>> enc0                                                                        
>>>     
>>> vlan100                                                                     
>>>     
>>> vlan101                                                                     
>>>     
>>> vlan102                                                                     
>>>     
>>> vlan2                                                                       
>>>     
>>> tun0                                                                        
>>>     
>>> gif0                                                                        
>>>     
>>> pflow0                                                                      
>>>     
>>> pflog0  
>>> 
>>>> 
>>>>> On 9 Nov 2015, at 09:36, Ryan Freeman <r...@slipgate.org> wrote:
>>>>> 
>>>>> Hey tech@,
>>>>> 
>>>>> At my wits end here, I recently got a sunfire v120 from work for pretty 
>>>>> cheap.
>>>>> Quite excited to have some non x86 hardware, I set it up as a router.
>>>>> 
>>>>> However, for some reason after sometimes mere hours -- othertimes days at 
>>>>> a
>>>>> time,  the gem0 interface needs to be cycled:
>>>>> 
>>>>> ifconfig gem0 down
>>>>> ifconfig gem0 up
>>>>> dhclient gem0
>>>>> 
>>>>> no packets pass until that has been done.   At first I have been placing 
>>>>> the
>>>>> blame squarely on the Hitron modem we have in the house from shaw cable,
>>>>> but now I've noticed the issue happen twice on the internal interface as 
>>>>> well,
>>>>> gem1.  All VLANs I have setup stop responding until gem1 is cycled.
>>>>> 
>>>>> gem1 is just used by a collection of vlan(4) interfaces, so traffic 
>>>>> resumes
>>>>> immediately after interface gem1 down/up.
>>>>> 
>>>>> I've tried to turn on ifconfig gem0 debug to catch anything wierd, but 
>>>>> there
>>>>> has been nothing of interest there.   Dmesg attached,  starting to wonder
>>>>> if this machine is at its EOL and the network ports are dying :(
>>>>> 
>>>>> This issue occurred with the 5.7 release as well.
>>>>> 
>>>>> dmesg:
>>>>> console is /pci@1f,0/pci@1,1/isa@7/serial@0,3f8
>>>>> Copyright (c) 1982, 1986, 1989, 1991, 1993
>>>>>      The Regents of the University of California.  All rights reserved.
>>>>> Copyright (c) 1995-2015 OpenBSD. All rights reserved.  
>>>>> http://www.OpenBSD.org
>>>>> 
>>>>> OpenBSD 5.8 (GENERIC) #0: Thu Oct 22 00:24:09 PDT 2015
>>>>>  r...@void.inter.lan:/usr/src/sys/arch/sparc64/compile/GENERIC
>>>>> real mem = 1073741824 (1024MB)
>>>>> avail mem = 1039228928 (991MB)
>>>>> mpath0 at root
>>>>> scsibus0 at mpath0: 256 targets
>>>>> mainbus0 at root: Sun Fire V120 (UltraSPARC-IIe 648MHz)
>>>>> cpu0 at mainbus0: SUNW,UltraSPARC-IIe (rev 3.3) @ 648 MHz
>>>>> cpu0: physical 16K instruction (32 b/l), 16K data (32 b/l), 512K external 
>>>>> (64 b/l)
>>>>> psycho0 at mainbus0: SUNW,sabre, impl 0, version 0, ign 7c0
>>>>> psycho0: bus range 0-2, PCI bus 0
>>>>> psycho0: dvma map c0000000-dfffffff
>>>>> pci0 at psycho0
>>>>> ppb0 at pci0 dev 1 function 1 "Sun Simba" rev 0x13
>>>>> pci1 at ppb0 bus 1
>>>>> ebus0 at pci1 dev 12 function 0 "Sun RIO EBus" rev 0x01
>>>>> "flashprom" at ebus0 addr 0-fffff not configured
>>>>> clock1 at ebus0 addr 0-1fff: mk48t59
>>>>> lom0 at ebus0 addr 200000-200003 ivec 0x2a: LOMlite2 rev 3.12
>>>>> alipm0 at pci1 dev 3 function 0 "Acer Labs M7101 Power" rev 0x00: 74KHz 
>>>>> clock
>>>>> iic0 at alipm0
>>>>> "max1617" at alipm0 addr 0x18 skipped due to alipm0 bugs
>>>>> spdmem0 at iic0 addr 0x54: 512MB SDRAM registered ECC PC133CL2
>>>>> spdmem1 at iic0 addr 0x55: 512MB SDRAM registered ECC PC133CL2
>>>>> ebus1 at pci1 dev 7 function 0 "Acer Labs M1533 ISA" rev 0x00
>>>>> power0 at ebus1 addr 2000-2007 ivec 0x25
>>>>> com0 at ebus1 addr 3f8-3ff ivec 0x2b: ns16550a, 16 byte fifo
>>>>> com0: console
>>>>> com1 at ebus1 addr 2e8-2ef ivec 0x2b: ns16550a, 16 byte fifo
>>>>> gem0 at pci1 dev 12 function 1 "Sun ERI Ether" rev 0x01: ivec 0x7c6, 
>>>>> address 00:03:ba:2b:47:70
>>>>> ukphy0 at gem0 phy 1: Generic IEEE 802.3u media interface, rev. 1: OUI 
>>>>> 0x0010dd, model 0x0002
>>>>> ohci0 at pci1 dev 12 function 3 "Sun USB" rev 0x01: ivec 0x7e4, version 
>>>>> 1.0, legacy support
>>>>> pciide0 at pci1 dev 13 function 0 "Acer Labs M5229 UDMA IDE" rev 0xc3: 
>>>>> DMA, channel 0 configured to native-PCI, channel 1 configured to 
>>>>> native-PCI
>>>>> pciide0: using ivec 0x7cc for native-PCI interrupt
>>>>> atapiscsi0 at pciide0 channel 0 drive 0
>>>>> scsibus1 at atapiscsi0: 2 targets
>>>>> cd0 at scsibus1 targ 0 lun 0: <TEAC, CD-224E, P.9A> ATAPI 5/cdrom 
>>>>> removable
>>>>> cd0(pciide0:0:0): using PIO mode 4, DMA mode 2
>>>>> pciide0: channel 1 disabled (no drives)
>>>>> gem1 at pci1 dev 5 function 1 "Sun ERI Ether" rev 0x01: ivec 0x7dc, 
>>>>> address 00:03:ba:2b:47:71
>>>>> ukphy1 at gem1 phy 1: Generic IEEE 802.3u media interface, rev. 1: OUI 
>>>>> 0x0010dd, model 0x0002
>>>>> ohci1 at pci1 dev 5 function 3 "Sun USB" rev 0x01: ivec 0x7e6, version 
>>>>> 1.0, legacy support
>>>>> usb0 at ohci0: USB revision 1.0
>>>>> uhub0 at usb0 "Sun OHCI root hub" rev 1.00/1.00 addr 1
>>>>> usb1 at ohci1: USB revision 1.0
>>>>> uhub1 at usb1 "Sun OHCI root hub" rev 1.00/1.00 addr 1
>>>>> ppb1 at pci0 dev 1 function 0 "Sun Simba" rev 0x13
>>>>> pci2 at ppb1 bus 2
>>>>> siop0 at pci2 dev 8 function 0 "Symbios Logic 53c896" rev 0x07: ivec 
>>>>> 0x7e0, using 8K of on-board RAM
>>>>> scsibus2 at siop0: 16 targets, initiator 7
>>>>> sym0 at scsibus2 targ 0 lun 0: <SEAGATE, ST336607LSUN36G, 0207> SCSI3 
>>>>> 0/direct fixed serial.SEAGATE_ST336607LSUN36G_3JA0DGN80000731804D9
>>>>> sd0 at scsibus0 targ 0 lun 0: <SEAGATE, ST336607LSUN36G, 0207> SCSI3 
>>>>> 0/direct fixed serial.SEAGATE_ST336607LSUN36G_3JA0DGN80000731804D9
>>>>> sd0: 34732MB, 512 bytes/sector, 71132959 sectors
>>>>> probe(siop0:1:0): Check Condition (error 0x70) on opcode 0x0
>>>>>  SENSE KEY: Hardware Error
>>>>>   ASC/ASCQ: Defect List Error
>>>>>   FRU CODE: 0x7
>>>>> sym1 at scsibus2 targ 1 lun 0: <SEAGATE, ST336607LSUN36G, 0207> SCSI3 
>>>>> 0/direct fixed serial.SEAGATE_ST336607LSUN36G_3JA0BZL100002316NCUL
>>>>> sd1 at scsibus0 targ 1 lun 0: <SEAGATE, ST336607LSUN36G, 0207> SCSI3 
>>>>> 0/direct fixed serial.SEAGATE_ST336607LSUN36G_3JA0BZL100002316NCUL
>>>>> siop1 at pci2 dev 8 function 1 "Symbios Logic 53c896" rev 0x07: ivec 
>>>>> 0x7e0, using 8K of on-board RAM
>>>>> scsibus3 at siop1: 16 targets, initiator 7
>>>>> siop0: target 0 now using tagged 16 bit 40.0 MHz 31 REQ/ACK offset xfers
>>>>> vscsi0 at root
>>>>> scsibus4 at vscsi0: 256 targets
>>>>> softraid0 at root
>>>>> scsibus5 at softraid0: 256 targets
>>>>> siop0: target 1 now using tagged 16 bit 40.0 MHz 31 REQ/ACK offset xfers
>>>>> bootpath: /pci@1f,0/pci@1,0/scsi@8,0/disk@0,0
>>>>> root on sd0a (dd2dc38974492ea6.a) swap on sd0b dump on sd0b
>>>>> 
>>>> 
>> 

Reply via email to