On 4/22/2019 2:51 PM, Steve Freyder via Xenomai wrote:
On 4/22/2019 1:45 AM, Jan Kiszka wrote:
On 22.04.19 08:40, C Smith via Xenomai wrote:
Thanks for your insight, Steve. I didn't realize rt_dev_write() doesnt
actually stall until it is called many times and the 4K TX buffer gets
full. (is that right Jan?)
It that is the case, sure I could find a way to check the TX buffer fill
level to prevent my app from stalling.

I rewrote the xeno_16550A driver RTSER_RTIOC_GET_STATUS ioctl to return to
userspace the contents of the IIR and the IER too.
I'm getting IIR = 0b 0001 0100, so the source of the latest interrupt is a
RX (not surprising, as I'm doing full duplex) and there is no THRE
interrupt pending.
So regardless of the ultimate cause, this state will never empty the TX
buffer.

I think my only choice is to try something I had to do once before on a
similarly misbehaving serial port: I'll rewrite the xeno_16550A interrupt handlers to redundantly check for data pending in the TX buffer whenever any interrupt like an RX interrupt happens. I do have bidirectional traffic
after all, so the driver will wake up frequently and keep the TX data
transmitting.

Interesting enough, the stall problem did not occur when I used the sample serial code provided by xenomai: cross-link.c . I also rewrote cross-link.c
to send a 72 byte packet and receive on the same port (I installed a
physical loopback device on the serial port). No stalls for 12+ hours with
packets streaming at 100 Hz.
The only difference in the serial configuration between that cross-link.c
app and my app was :
struct rtser_config :
.rx_timeout = RTSER_DEF_TIMEOUT // infinite , no stall for
many hours in cross-link.c
versus:
.rx_timeout = 500000 // 500us, stalls within an hour in my
app
I don't know why an RX setting affects TX behavior. I also can't use
RTSER_DEF_TIMEOUT in my application or it dies when it starts up - no clue
why.  But I did try setting
.rx_timeout = 5000000 // 5 ms. my app doesnt stall for several
hours
and though that did not cause the serial to stall in my app for several
hours of testing, it is just open-loop finger-crossing, and not a real
solution.
I need the TX interrupts to fire reliably. So I think I must rewrite that
interrupt handler, as above.


I think we have a race between rt_16550_write filling the software queue that
the tx interrupt is supposed to write out and the latter already firing,
consuming that event without seeing the queue filled. I'll think about a better algorithm tomorrow, one that can possibly get rid of some interrupt events as well.

Jan

Greetings again,

If cross-link.c is not stalling, but the CSmith application hangs on
startup when using similar settings to what cross-link.c is using, it
tells me that understanding why this "hang on startup" is happening
would be a good idea.  I know this has happened to me when I got an
event from a UART that my code did not handle, and because I did not
handle it, the event continued to fire over and over - a hang. I
theorized that perhaps there's an issue with there being stale data
or a data overrun condition that exists when the app starts up that's
causing this hang.  In either case, it sounds as though the difference
in settings between CSmith app and cross-link.c might be a key factor.

I went back to the previous email trail, and if I interpreted it
correctly, the overall data rate is only about 80% of 115Kbaud. This
suggests that every time there is a write, the 4K software buffer in
the driver should be completely empty - as should the TX FIFO. The
only time that won't be true is when the transmit processing got
stalled (by loss of interrupt, or whatever).

I would be interested to see what happens if the CSmith app
were to be modified to write one byte at a time, with no delay
between rt_dev_write() calls.

Finally, some searching shows that back when the original National
Semiconductor 16550[A] UARTs were first being "cloned" by other
vendors, National created a program called "COMTEST" that was
designed to reveal the "misbehaviour" of those competing chips by
doing extensive testing of the timing and other characteristics and
how it deviated from "the real thing".  I wonder if anyone in this
group knows where a copy of that program (or a more modern version)
might exist?

Regards,
Steve


Apologies, I said "hangs on startup" but the original statement was
"dies on startup".  So the theory was that if that were fixed, and
the timeout was RTSER_DEF_TIMEOUT like it is in cross-link.c, that
this might solve the problem.


Reply via email to