Sorry, early morning WISPAMERICA brain. What I meant to ask, Nathan, was how you captured both sides of PDN/EPC traffic isolated from the CPE7000? The ingress I can understand, but the egress would be S1 encapsulated, no?
This could make a great forum post -- I can imagine needing it myself someday. Thanks again. On Thu, Mar 16, 2017 at 9:01 AM, Jeremy Austin <jhaus...@gmail.com> wrote: > Nathan, thanks for the clarification. > > On Thu, Mar 16, 2017 at 8:54 AM Nathan Anderson <nath...@fsr.com> wrote: > >> Just an update to this: at the direction of Telrad support, I ran 2 >> simultaneous packet captures during a download where corruption occurred: >> one right at the point of ingress at the EPC, and one right at the point of >> egress. >> >> >> >> It turns out that I was WRONG about part of this. The EPC is definitely >> corrupting traffic in the newer firmwares we have been given, as the >> captures demonstrate, but it is NOT also regenerating the TCP payload >> checksums on every packet that flows through it, thank goodness. No, it >> turns out that the reason these payloads are making it all the way to the >> user is because the CPE7000's NAT engine is the one completely recomputing >> the checksums, instead of properly modifying them to only reflect the >> changes that it makes to the headers (see >> https://www.ietf.org/rfc/rfc1631.txt >> section 3.3). So this is a two-parter: the EPC is corrupting bits, and >> the CPE7000 is responsible for covering up the corruption. >> >> >> >> I tested with a CPE8000, and its NAT engine is doing the right thing. >> Thus, the corrupt packets make it to the client, which sees the invalid >> checksum, and which tosses the packet, triggering retransmit. >> >> >> >> The EPC firmware we have been using is a development build, and the >> corruption bug appears to be unique to that. But the CPE7000 firmware we >> used for testing was the latest public release (116). >> >> >> >> -- Nathan >> >> >> >> *From:* telrad-boun...@wispa.org [mailto:telrad-boun...@wispa.org] *On >> Behalf Of *Nathan Anderson >> *Sent:* Wednesday, March 15, 2017 1:47 AM >> *To:* telrad@wispa.org >> >> >> *Subject:* Re: [Telrad] Uplink throughput again >> >> >> >> This is exactly it. We didn't have the visibility into things to see >> what was causing the poor throughput at first (yet another one of our >> longstanding frustrations with the platform), but this is the problem that >> Jeremy and I were referring to. >> >> >> >> I'm glad to say that we have not (knowingly) experienced the CPU usage >> fluctuations on our EPCs. >> >> >> >> As far as the data corruption one, you likely will not have run up >> against it unless you are running a preproduction release of 6.7. The >> symptoms are that we will see clusters of 4 consecutive bytes that have >> various bits flipped (usually what happens is that bytes 1 and 2 are zeroed >> out, and bytes 3 and 4 are completely different than what they would >> normally be, but the pattern of what exactly is changed is not clear to us >> yet). We see on average between 12 and 60 bytes per 100MB transferred per >> user in this state. The VERY BAD and VERY SCARY part is that if you do a >> packet capture, you will see that exactly zero TCP packets have a checksum >> that does not validate. So it's not like data is getting corrupted, and a >> lot of packets are being thrown out because the checksum doesn't >> compute/match, but a small percentage or handful get through. No, every >> single packet has a valid checksum, even the ones with corrupt data in >> them. What this means is that 1) HTTPS transfers just stop and die when >> the corruption occurs, and 2) HTTP/FTP/other unencrypted transfers >> introduce silent data corruption into the download that you won't discover >> until it is too late. >> >> >> >> That all packets have a checksum that validates would seem to suggest >> that the EPC is ingesting TCP packets from the PDN interface, throwing out >> the original TCP checksum (as a shortcut, or...? what valid reasons would >> you possibly have for doing this?), doing something internally that causes >> random corruption, and then recomputing a new checksum from scratch before >> sending it onto the target user over S1-U. That a bug like this is even >> *possible* BLOWS MY MIND. If you're going to ignore the original checksum >> that the packet arrives with, what's the point of the checksum in the first >> place? How can I ever trust the data flowing through this device again >> knowing that it is working around and subverting a key component that helps >> to ensure and preserve data integrity? >> >> >> >> -- Nathan >> >> >> >> *From:* telrad-boun...@wispa.org [mailto:telrad-boun...@wispa.org >> <telrad-boun...@wispa.org>] *On Behalf Of *Adam Moffett >> *Sent:* Tuesday, March 14, 2017 8:34 PM >> *To:* telrad@wispa.org; telrad@wispa.org >> *Subject:* Re: [Telrad] Uplink throughput again >> >> >> >> * UE getting stuck at MCS4....apparently until an S1 reset. This may or >> may not be the same throughput issue that you guys were talking about >> earlier in the thread. >> _______________________________________________ >> Telrad mailing list >> Telrad@wispa.org >> http://lists.wispa.org/mailman/listinfo/telrad >> > -- Jeremy Austin (907) 895-2311 (907) 803-5422 jhaus...@gmail.com Heritage NetWorks Whitestone Power & Communications Vertical Broadband, LLC Schedule a meeting: http://doodle.com/jermudgeon
_______________________________________________ Telrad mailing list Telrad@wispa.org http://lists.wispa.org/mailman/listinfo/telrad