Hey, In two different networks, one Juniper, one Cisco, I've seen router silently mangle packets in transit, calculate correct Ethernet FCS on broken packet and forward it.
In MPLS network this means, that you'll only occasionally know about this problem, when egress PE router notices IP-checksum-error. You will only notice this, when the corruption happens to happen in the 20B of IP header. When corruption happens anywhere else or in IPv6 you won't know about it at all. If this is common failure-mode, then vendors likely can address this problem, by calculating internalFCS on unchanging parts of the data on ingressPHY and reverify it on egressPHY, giving us good confidence that we'll catch broken HW. This problem does not exist on-link or on L2, in both of these cases we're protected by FCS over whole data, which is very strong statistical guarantee that we're not breaking packets. This problem only impacts L3, and particularly badly MPLS, as it makes finding the culprit very hard. If you are running MPLS labeled IPv4 network, I'd like you to check if you're getting IP checksum errors from your core side: JunOS Trio: JunOS> show interfaces et-0/1/0 extensive |match "(index|incompletes)" Interface index: 183, SNMP ifIndex: 662, Generation: 186 Errors: 646537, Drops: 0, Framing errors: 0, Runts: 0, Policed discards: 0, L3 incompletes: 646537, JunOS> start shell pfe network fpc0 RMPC0(JunOS vty)# show jnh ifd 183 stream checksum: 0000000003450169 pkts, 0000002655295885 bytes ASR9k: IOSXR#show controllers np counters all | i "(Node|NP[0-9]|CHECKSUM)" Node: 0/0/CPU0: Show global stats counters for NP0, revision v2 Show global stats counters for NP1, revision v2 Show global stats counters for NP2, revision v2 Show global stats counters for NP3, revision v2 Show global stats counters for NP4, revision v2 Show global stats counters for NP5, revision v2 Show global stats counters for NP6, revision v2 142 PARSE_DROP_IPV4_CHECKSUM_ERROR 24168 0 Show global stats counters for NP7, revision v2 142 PARSE_DROP_IPV4_CHECKSUM_ERROR 34 0 Node: 0/1/CPU0: Show global stats counters for NP0, revision v2 Show global stats counters for NP1, revision v2 Show global stats counters for NP2, revision v2 Show global stats counters for NP3, revision v2 Show global stats counters for NP4, revision v2 Show global stats counters for NP5, revision v2 Node: 0/3/CPU0: Show global stats counters for NP0, revision v3 Show global stats counters for NP1, revision v3 Show global stats counters for NP2, revision v3 Show global stats counters for NP3, revision v3 IOSXR#show controllers np ports np6 location 0/0/CPU0 Node: 0/0/CPU0: ---------------------------------------------------------------- NP Bridge Fia Ports -- ------ --- --------------------------------------------------- 6 -- 3 TenGigE0/0/0/18 - TenGigE0/0/0/20 IOSXR# - you won't know which of those ports is the culprit, but hopefully there are few enough options to decide they're coming from MPLS labeled interface. If you want, you can further capture the broken packets: https://gist.github.com/ytti/2323b019152eca6e05718bccd855566e https://gist.github.com/ytti/436fe3b602a963acf21e Blog I wrote when I originally saw this in Juniper network: http://blog.ip.fi/2014/02/junos-l3-incompletes-what-and-why.html Security implication this may have: http://dinaburg.org/bitsquatting.html If you do see the problem, I'd be very happy to talk to you and help you with the issue. Thanks! -- ++ytti _______________________________________________ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp