On 2018-07-19 02:29, Paul Koning wrote:


On Jul 18, 2018, at 8:22 PM, Johnny Billquist <[email protected]> wrote:

On 2018-07-19 02:07, Paul Koning wrote:
On Jul 18, 2018, at 7:18 PM, Johnny Billquist <[email protected]> wrote:

...

It's probably worth pointing out that the reason I implemented that was not 
because of hardware problems, but because of software problems. DECnet can 
degenerate pretty badly when packets are lost. And if you shove packets fast 
enough at the interface, the interface will (obviously) eventually run out of 
buffers, at which point packets will be dropped.
This is especially noticeable in DECnet/RSX at least. I think I know how to 
improve that software, but I have not had enough time to actually try fixing 
it. And it is especially noticeable when doing file transfers over DECnet.
All ARQ protocols suffer dramatically with packet loss.  The other day I was reading a 
recent paper about high speed long distance TCP.  It showed a graph of throughput vs. 
packet loss rate.  I forgot the exact numbers, but it was something like 0.01% packet 
loss rate causes a 90% throughput drop.  Compare that with the old (1970s) ARPAnet rule 
of thumb that 1% packet loss means 90% loss of throughput.  Those both make sense; the 
old one was for "high speed" links running at 56 kbps, rather than the 
multi-Gbps of current links.
The other thing with nontrivial packet loss is that any protocol with 
congestion control algorithms triggered by packet loss (such as recent versions 
of DECnet), the flow control machinery will severely throttle the link under 
such conditions.
So yes, anything you can do in the infrastructure to keep the packet loss well 
under 1% is going to be very helpful indeed.

Right. That said, TCP behaves extremely much better than DECnet here. At least 
if we talk about TCP with the ability to deal with out of order packets (which 
most should do) and DECnet under RSX. The problem with DECnet under RSX is that 
recovering from a lost packet because of congestion essentially guarantees that 
congestion will happen again, while TCP pretty quickly comes into a steady 
working state.

Out of order packet handling isn't involved in that.  Congestion doesn't 
reorder packets.  If you drop a packet, TCP and DECnet both force the 
retransmission of all packets starting with the dropped one.  (At least, I 
don't think selective ACK is used in TCP.)  DECnet described out of order 
packet caching for the same reason TCP does: to work efficiently in packet 
topologies that have multiple paths in which the routers do equal cost path 
splitting.  In DECnet, that support is optional; it's not in DECnet/E and I 
wouldn't expect it in other 16-bit platforms either.

This is maybe getting too technical, so let me know if we should take this off list.

Yes, congestion does not reorder packets. However, if you cannot handle out of order packets, you have to retransmit everything from the point where a packet was lost. If you can deal with packets out of order, you can keep the packets you received, even though there is a hole, and once that hole is plugged, you can ACK everything. And this is pretty normal in TCP, even without selective ACK.

So, in TCP, what normally happens is that a node is spraying packets as fast as it can. Some packets are lost, but not all of them. Including some holes in the sequence of received packets. TCP will after some time, or other heuristics, start retransmitting from the point where packets were lost, and as soon as the receiving end have plugged the hole, it will jump forward with the ACKs, meaning the sender does not need to retransmit everything. Even more, if the sender does retransmit everything, loosing some of those retransmitted packets will not matter, since the receiver already have them anyway. At some point, you will get to a state where the receiver have no window open, so the transmitter is getting blocked, and every time the receiver opens up a window, which usually is just a packet or two in size, the transmitter can send that much data. But this much data is usually less than the number of buffers the hardware have, so there are no problems receiving those packets, and TCP gets into a steady state where the transmitter can transmit packets as fast as the receiver can consume them, and apart from a few lost packets in the early stages, no packets are lost.

DECnet (at least in RSX) on the other hand will transmit a whole bunch of packets. The first few will get through, but at some point one or several are lost. After some time, DECnet decides that packets were lost, and will back up and start transmitting again from the point where the packets were lost. Once more it will soon blast more packets than the receiver can process, and you will once more get a timeout situation. DECnet is backing off on the timeouts every time this happens, and soon you are at a horrendous 127s timeout for pretty much every other packet sent, meaning in effect you are only managing to send one packet every 127s. This is worsened, I think, by something that looks like a bug in the NFT/FAL code in RSX, where the code assumes it is faster than the packet transfer rate, and can manage to do a few things before two packets have been received. How much is to blame on DECnet in general, and how much on NFT/FAL, I'm not entirely clear. Like I said, I have not had time to really test things around this. But it's very easy to demonstrate the problem. Just setup an old PDP-11 and a simh (or similar) machine on the same DECnet, and try to transfer a larger file to the real PDP-11, and check network counters and observe how thing immediately go to a standstill.

Which is why I implemented the throttling in the bridge, which Mark mentioned.

As far as path splitting goes, it is implemented in RSX-11M-PLUS, but disabled. I tried enabling it once, but the system crashed. The manuals have it documented, but I'm wondering if DEC never actually completed the work.

I have not analyzed other DECnet implementation enough to tell for sure if they 
also exhibit the same problem.

Another consideration is that TCP has seen another 20 years of work on 
congestion control since DECnet Phase IV.  But in any case, it may well be that 
VMS handles these things better.  It's also possible that DECnet/OSI does, 
since it is newer and was designed right around the time that DEC very 
seriously got into congestion control algorithm research.  Phase IV isn't so 
well developed; it largely predates that work.

Well, this isn't really about congestion control so much as just being able to handle out of order packets. Although congestion control could certainly also be applied to alleviate the problem.

I know that OSI originally stated the same basic assumption DECnet have - links are 100% reliable and never drops or reorder packets. A very bad assumption to build protocols on, and OSI eventually also defined links and operations based on technology where these assumptions were not true. So I would hope/assume that DECnet/OSI eventually got better. But I strongly suspect it was not the case from the start.

  Johnny

--
Johnny Billquist                  || "I'm on a bus
                                  ||  on a psychedelic trip
email: [email protected]             ||  Reading murder books
pdp is alive!                     ||  tryin' to stay hip" - B. Idol
_______________________________________________
Simh mailing list
[email protected]
http://mailman.trailing-edge.com/mailman/listinfo/simh

Reply via email to