This is exactly it.  We didn't have the visibility into things to see what was 
causing the poor throughput at first (yet another one of our longstanding 
frustrations with the platform), but this is the problem that Jeremy and I were 
referring to.

I'm glad to say that we have not (knowingly) experienced the CPU usage 
fluctuations on our EPCs.

As far as the data corruption one, you likely will not have run up against it 
unless you are running a preproduction release of 6.7.  The symptoms are that 
we will see clusters of 4 consecutive bytes that have various bits flipped 
(usually what happens is that bytes 1 and 2 are zeroed out, and bytes 3 and 4 
are completely different than what they would normally be, but the pattern of 
what exactly is changed is not clear to us yet).  We see on average between 12 
and 60 bytes per 100MB transferred per user in this state.  The VERY BAD and 
VERY SCARY part is that if you do a packet capture, you will see that exactly 
zero TCP packets have a checksum that does not validate.  So it's not like data 
is getting corrupted, and a lot of packets are being thrown out because the 
checksum doesn't compute/match, but a small percentage or handful get through.  
No, every single packet has a valid checksum, even the ones with corrupt data 
in them.  What this means is that 1) HTTPS transfers just stop and die when the 
corruption occurs, and 2) HTTP/FTP/other unencrypted transfers introduce silent 
data corruption into the download that you won't discover until it is too late.

That all packets have a checksum that validates would seem to suggest that the 
EPC is ingesting TCP packets from the PDN interface, throwing out the original 
TCP checksum (as a shortcut, or...? what valid reasons would you possibly have 
for doing this?), doing something internally that causes random corruption, and 
then recomputing a new checksum from scratch before sending it onto the target 
user over S1-U.  That a bug like this is even *possible* BLOWS MY MIND.  If 
you're going to ignore the original checksum that the packet arrives with, 
what's the point of the checksum in the first place?  How can I ever trust the 
data flowing through this device again knowing that it is working around and 
subverting a key component that helps to ensure and preserve data integrity?

-- Nathan

From: telrad-boun...@wispa.org [mailto:telrad-boun...@wispa.org] On Behalf Of 
Adam Moffett
Sent: Tuesday, March 14, 2017 8:34 PM
To: telrad@wispa.org; telrad@wispa.org
Subject: Re: [Telrad] Uplink throughput again

* UE getting stuck at MCS4....apparently until an S1 reset.  This may or may 
not be the same throughput issue that you guys were talking about earlier in 
the thread.
_______________________________________________
Telrad mailing list
Telrad@wispa.org
http://lists.wispa.org/mailman/listinfo/telrad

Reply via email to