On 26.04.19 02:59, C Smith wrote:
> On Thu, Apr 25, 2019 at 1:23 AM Jan Kiszka <jan.kis...@siemens.com 
> <mailto:jan.kis...@siemens.com>> wrote:
> 
>     On 25.04.19 09:15, C Smith wrote:
>      > Hi Jan,
>      >
>      > Your patch worked somewhat but not completely. It prevents my app from
>     stalling
>      > forever, but I caugh the serial transmission itself stalling on the
>     oscilloscope
>      > for quite a long time. My 72 byte TX packet from the xenomai periodic
>     task gets
>      > cut in half and there is no transmission for 7msec, then the 
> transmission
>      > resumes. (I'll send you a screenshot)
> 
>     What is driver and application state during that phase? Who is waiting on 
> what?
>     This will be the key to resolve that issue as I'm not yet seeing another
>     mistake
>     in the driver.
> 
> 
> I don't think there is a bug in the serial driver, per se, but my strange 
> UART 
> requires more from a driver to prevent stalls.
> This is a BCM corp 'BCM87Q' industrial motherboard. They are still sold, not 
> yet 
> EOL.
> 
> We do know a lot about the state the serial driver is in: It is just waiting, 
> thinking it doesn't have any more bytes to transmit. Remember in previous 
> tests 
> the IIR indicated no pending bytes in the THR. I've demonstrated how to get 
> past 
> this state with my TX "polling patch".  I ran my latest test for 12+ hours 
> where 
> I was using your patch plus my polling patch and there were no stalls 
> whatsoever 
> of the serial driver, as verified by an Oscilloscope which triggers on a TX 
> stall. The maximum inter-packet jitter of my TX packet was also fairly low, 
> at 
> <= 450us. In my polling patch, during a RX interrupt, the code redundantly 
> checks the high level transmit buffer to see if rt_16550_tx_fill() should be 
> called. Sure, this workaround only helps when you have full-duplex 
> communications, it would not help during simplex communications.
> 
> Since a device driver can't be reliably polled, I'd prefer some 
> self-correcting 
> mechanism in the driver which set a callback when it thinks it has 
> transmitted 
> the last byte, and wakes up and checks one more time about 100us later to see 
> if 
> it needs to transmit anything else.

I'd prefer to install any watchdog for potential hardware issues until
we really know they aren't software races.

If there a chance to either break a trace or record the full run when
the issue happens? Then you could try this instrumentation, together with 
ftrace (trace-cmd record -e "cobalt*"):

diff --git a/kernel/drivers/serial/16550A.c b/kernel/drivers/serial/16550A.c
index 81acc6344e..504d85ccbe 100644
--- a/kernel/drivers/serial/16550A.c
+++ b/kernel/drivers/serial/16550A.c
@@ -197,6 +197,7 @@ static void rt_16550_tx_fill(struct rt_16550_context *ctx)
        unsigned long base = ctx->base_addr;
        int mode = rt_16550_io_mode_from_ctx(ctx);
 
+       trace_printk("tx_fill, out_npend: %ld", ctx->out_npend);
 /*     if (uart->modem & MSR_CTS)*/
        {
                for (count = ctx->tx_fifo;
@@ -239,6 +240,7 @@ static int rt_16550_interrupt(rtdm_irq_t * irq_context)
 
        while (1) {
                iir = rt_16550_reg_in(mode, base, IIR) & IIR_MASK;
+               trace_printk("IIR: 0x%x", iir);
                if (iir & IIR_PIRQ)
                        break;
 
@@ -284,6 +286,7 @@ static int rt_16550_interrupt(rtdm_irq_t * irq_context)
        }
 
        if ((ctx->ier_status & IER_TX) && (ctx->out_npend == 0)) {
+               trace_printk("IER_TX off");
                /* mask transmitter empty interrupt */
                ctx->ier_status &= ~IER_TX;
 
@@ -1030,10 +1033,12 @@ ssize_t rt_16550_write(struct rtdm_fd *fd, const void 
*buf, size_t nbyte)
 
                        lsr = rt_16550_reg_in(rt_16550_io_mode_from_ctx(ctx),
                                              ctx->base_addr, LSR);
+                       trace_printk("LSR: 0x%x", lsr);
                        if (lsr & RTSER_LSR_THR_EMTPY)
                                rt_16550_tx_fill(ctx);
 
                        if (ctx->out_npend > 0 && !(ctx->ier_status & IER_TX)) {
+                               trace_printk("IER_TX on, out_npend: %lu", 
ctx->out_npend);
                                /* unmask tx interrupt */
                                ctx->ier_status |= IER_TX;
                                rt_16550_reg_out(rt_16550_io_mode_from_ctx(ctx),


Jan

-- 
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux

Reply via email to