Hi Günter, [email protected] wrote: > Hi Wolfgang, > > Wolfgang Grandegger wrote: >> [email protected] wrote: >>> Problem is: >>> MSCAN has three Tx-Buffers. But only one buffer-interface. >>> So to select one buffer you must use a buffer selection (cantbsel) >>> register. >>> >>> Your driver implementation selects the buffer in the >>> mscan_start_xmit() function and then fills the buffer with payload. >>> >>> But in the ISR mscan_isr() the buffer is also selected >>> to maintain the tx-stats. >>> >>> Race-condition occurs if mscan_isr() interrupts mscan_start_xmit() >>> after it selects a buffer by writing to the tbsel register. >>> After ISR returns mscan_start_xmit will continue to write payload >>> into the wrong buffer. >> TX interrupts are disabled in mscan_start_xmit() while accessing the >> tbsel register and writing the payload: >> >> static netdev_tx_t mscan_start_xmit(struct sk_buff *skb, ...) { >> out_8(®s->cantier, 0); >> ... >> out_8(®s->cantier, priv->tx_active); >> >> return NETDEV_TX_OK; >> } > > Of course, you are right. You are using cantier to disable TX interrupts. > I overlooked this code - because I use a different, more obstrusive technique: > In eCos a DSR (deferred service routine - called by dispatcher shortly after > ISR) > can be blocked by simply locking/unlocking the scheduler. > > I used a "global" lock instead of cantier because I am not perfectly sure > how this register really works. Weird thing with this register is that it > is also used as a kind of acknowledgement register - the first thing i tried > when fixing my driver was to treat cantier like a normal interrupt enable > register: > I only enabled it once during booting - and only used cantflg in ISR to know > which buffer was transmitted. However this did not work -> cantier bits also > needs > to be cleared in ISR. So I suspect cantier to be a kind of acknowledgement > register > and therfore its perhaps more secure to only delete bits inside the ISR and > only set bits inside the transmit functions. > > The problem with my driver only occured when sending a lot of > messages (~4000/second). > >>> A second race condition occurs with the cantier register. >>> The documentation says its an interrupt-enable register. >>> But in-fact it is a sort of acknowledgement register which >>> is accessed in both ISR and xmit function with load-and-store >>> operations. >> I don't see that problem yet in the MSCAN Socketcan driver. > > ACK. > >>> Perhaps the following lines could be written simpler >>> in function mscan_isr(): >>> >>> cantier = in_8(®s->cantier) & MSCAN_TXE; >>> cantflg = in_8(®s->cantflg) & cantier; >>> >>> if (cantier && cantflg) { >>> ... >>> Instead of this i would write (its the same): >>> if (cantflg) { >> This will break the synchronization with mscan_start_xmit(), because >> the ISR can be entered by to RX or error interrupts as well. > > Here I disagree: It right that you must also look at cantier because > ISR can also be entered for RX interrupt, but look how you initialize > cantflg: > > cantflg = in_8(®s->cantflg) & cantier; > ^^^^^^ > It is already masked by cantier! > > So 'if (cantier && cantflg)' is just redundant code. > > It would be better to just write 'if (cantier)' or second possibility is > to do not mask cantflg with cantier in intitialization but mask it later > in the if-condition.
OK, I see. This should be corrected. Anyway, your review re-triggered myself thinking about this type of locking with cantier. While it's OK if the ISR is executed in the interrupt context, it may fail with threaded IRQs (like used by -rt). It think it should be replaced with a more robust locking. e.g. spin_lock_irqsave/restore. Will prepare a patch a.s.a.p. Thanks, Wolfgang. _______________________________________________ Socketcan-core mailing list [email protected] https://lists.berlios.de/mailman/listinfo/socketcan-core
