Hello, I have been doing some tests with kernel 4.14 on some MT7621-based devices (ZBT WG2926 and WG3526). Sometimes, seemingly out of the blue, I get the following oops:
[ 95.073332] ------------[ cut here ]------------ [ 95.077988] WARNING: CPU: 1 PID: 0 at net/sched/sch_generic.c:320 dev_watchdog+0x1ac/0x324 [ 95.086240] NETDEV WATCHDOG: eth0 (mtk_soc_eth): transmit queue 0 timed out [ 95.093166] Modules linked in: rt2800pci rt2800mmio rt2800lib qcserial ppp_async option usb_wwan rt2x00pci rt2x00mmio rt2x00lib rndis_host qmi_wwan ppp_generic nf_nat_pptp nf_conntrack_pptp nf_conntrack_ipv6 mt76x2e mt7603e mt76 mac80211 iptable_nat ipt_REJECT ipt_MASQUERADE huawei_cdc_ncm cfg80211 cdc_ncm cdc_ether ax88179_178a asix xt_time xt_tcpudp xt_tcpmss xt_string xt_statistic xt_state xt_recent xt_quota xt_policy xt_pkttype xt_owner xt_nfacct xt_nat xt_multiport xt_mark xt_mac xt_limit xt_length xt_hl xt_helper xt_esp xt_ecn xt_dscp xt_conntrack xt_connmark xt_connlimit xt_connlabel xt_connbytes xt_comment xt_addrtype xt_TCPMSS xt_REDIRECT xt_NFQUEUE xt_LOG xt_HL xt_FLOWOFFLOAD xt_DSCP xt_CT xt_CLASSIFY usbserial usbnet ts_fsm ts_bm slhc nfnetlink_queue nfnetlink_acct nf_reject_ipv4 nf_nat_tftp [ 95.164255] nf_nat_snmp_basic nf_nat_sip nf_nat_redirect nf_nat_proto_gre nf_nat_masquerade_ipv4 nf_nat_irc nf_conntrack_ipv4 nf_nat_ipv4 nf_nat_h323 nf_nat_ftp nf_nat_amanda nf_nat nf_log_ipv4 nf_flow_table_hw nf_flow_table nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_tftp nf_conntrack_snmp nf_conntrack_sip nf_conntrack_rtcache nf_conntrack_proto_gre nf_conntrack_netlink nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp nf_conntrack_broadcast ts_kmp nf_conntrack_amanda iptable_raw iptable_mangle iptable_filter ipt_ah ipt_ECN ip6table_raw ip_tables crc_itu_t crc_ccitt compat_xtables compat cdc_wdm cdc_acm br_netfilter sch_cake act_connmark nf_conntrack act_skbedit act_mirred em_u32 cls_u32 cls_tcindex cls_flow cls_route cls_fw sch_tbf sch_htb sch_hfsc sch_ingress ledtrig_usbport xt_set ip_set_list_set [ 95.235207] ip_set_hash_netiface ip_set_hash_netport ip_set_hash_netnet ip_set_hash_net ip_set_hash_netportnet ip_set_hash_mac ip_set_hash_ipportnet ip_set_hash_ipportip ip_set_hash_ipport ip_set_hash_ipmark ip_set_hash_ip ip_set_bitmap_port ip_set_bitmap_ipmac ip_set_bitmap_ip ip_set nfnetlink ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables x_tables ifb ipcomp6 xfrm6_tunnel xfrm6_mode_tunnel xfrm6_mode_transport xfrm6_mode_beet esp6 ah6 ipcomp xfrm4_tunnel xfrm4_mode_tunnel xfrm4_mode_transport xfrm4_mode_beet esp4 ah4 tunnel6 tunnel4 tun af_key xfrm_user xfrm_ipcomp xfrm_algo eeprom_93cx6 sha256_generic sha1_generic jitterentropy_rng drbg md5 hmac echainiv des_generic cmac cbc authenc leds_gpio xhci_mtk xhci_plat_hcd xhci_pci xhci_hcd uhci_hcd ohci_platform [ 95.306669] ohci_hcd ehci_platform sd_mod scsi_mod ehci_hcd gpio_button_hotplug usbcore nls_base usb_common mii [ 95.316933] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.14.34 #0 [ 95.322940] Stack : 00000000 00000000 00000000 00000000 805c7ada 00000034 00000000 8052c54c [ 95.331300] 8fc441f4 805668e7 804f5a2c 00000001 00000000 00000001 8fc0dd68 00000007 [ 95.339655] 00000000 00000000 805c0000 00007cd0 00000000 00000165 00000007 00000000 [ 95.348003] 00000000 80570000 0004d605 00000000 00000000 00000000 80590000 8034d5a8 [ 95.356355] 00000009 00000140 00000001 8ff15940 00000003 8027c368 00000004 805c0004 [ 95.364707] ... [ 95.367150] Call Trace: [ 95.369622] [<800103e8>] show_stack+0x58/0x100 [ 95.374080] [<8043bfcc>] dump_stack+0x9c/0xe0 [ 95.378428] [<8002e170>] __warn+0xe0/0x114 [ 95.382511] [<8002e1d4>] warn_slowpath_fmt+0x30/0x3c [ 95.387476] [<8034d5a8>] dev_watchdog+0x1ac/0x324 [ 95.392184] [<80085b94>] call_timer_fn.isra.3+0x24/0x84 [ 95.397395] [<80085dac>] run_timer_softirq+0x1b8/0x244 [ 95.402543] [<804590d0>] __do_softirq+0x128/0x2ec [ 95.407241] [<80032870>] irq_exit+0x98/0xcc [ 95.411434] [<8023944c>] plat_irq_dispatch+0xfc/0x138 [ 95.416477] [<8000b508>] except_vec_vi_end+0xb8/0xc4 [ 95.421427] [<8000ced0>] r4k_wait_irqoff+0x1c/0x24 [ 95.426230] [<80065f3c>] do_idle+0xe4/0x168 [ 95.430402] [<800661b8>] cpu_startup_entry+0x24/0x2c [ 95.435451] ---[ end trace b6fed008f9c0705a ]--- [ 95.440096] mtk_soc_eth 1e100000.ethernet eth0: transmit timed out [ 95.446363] mtk_soc_eth 1e100000.ethernet eth0: dma_cfg:80000067 [ 95.452456] mtk_soc_eth 1e100000.ethernet eth0: tx_ring=0, base=0e840000, max=0, ctx=1241, dtx=1164, fdx=1164, next=1241 [ 95.463414] mtk_soc_eth 1e100000.ethernet eth0: rx_ring=0, base=0e7c0000, max=0, calc=1472, drx=1473 [ 95.733338] mtk_soc_eth 1e100000.ethernet eth0: port 3 link down [ 95.745171] mtk_soc_eth 1e100000.ethernet eth0: port 4 link down [ 95.886802] mtk_soc_eth 1e100000.ethernet: 0x100 = 0x5160030c, 0x10c = 0x80818 [ 95.908574] mtk_soc_eth 1e100000.ethernet: PPE started [ 99.110085] mtk_soc_eth 1e100000.ethernet eth0: port 3 link up [ 99.472001] mtk_soc_eth 1e100000.ethernet eth0: port 4 link up After this oops, TX on the switch is dead. If I try to send a packet on for example the WAN-interface, I can see the packet with tcpdump, but no packet is sent over the wire (verified by using a tap). I saw the same error with kernel 4.9 (see the mailing list thread here: https://www.mail-archive.com/lede-dev@lists.infradead.org/msg08327.html), but then the error went away when I disabled flow control. Also, with kernel 4.9, setting the ports down/up always recovered the switch. With kernel 4.14, restarting the ports no longer seems to have an effect (i.e., TX is still broken). The "... link up"-message in the log above is my script setting the ports up. The only way to recover the switch (that I have found) is to reboot the router. Does anyone know what could be wrong, how to work around this issue or where to look? Thanks in advance for any help. BR, Kristian _______________________________________________ Lede-dev mailing list Lede-dev@lists.infradead.org http://lists.infradead.org/mailman/listinfo/lede-dev