On Thu, Aug 4, 2016 at 4:34 PM, Joe Hershberger <joe.hershber...@gmail.com> wrote: > On Thu, Aug 4, 2016 at 4:22 PM, Joe Hershberger > <joe.hershber...@gmail.com> wrote: >> Hi Stephen, >> >> On Tue, Aug 2, 2016 at 11:56 AM, Stephen Warren <swar...@wwwdotorg.org> >> wrote: >>> On 08/01/2016 09:20 PM, Peter Chubb wrote: >>>> >>>> >>>> Hi Folks, >>>> Since patch 96350f729c42 "dm: tegra: net: Convert tegra boards to >>>> driver model for Ethernet" booting via dhcp has been broken on the >>>> Jetson TK1. >>> >>> >>> Best to Cc the net and DM maintainers (Joe Hershberger and Simon Glass) on >>> this to make sure they see the issue. I've done so here. >>> >>>> I tried applying "net: Probe PCI before looking for ethernet >>>> devices"; this `works' in that the ethernet device is detected and >>>> works, but I end up with huge numbers of >>>> CACHE: Misaligned operation at range [fffb8c00, fffb8c2e] >>>> messages on the serial console. >>> >>>> >>>> >>>> These come from the flush_cache() calls in net/rtl8169.c. I >>>> suggest the attached patch (or something like it): >>> >>> >>> Interesting; in my automated testing system, I do see these cache error >>> messages, yet ping and TFTP work for me. Admittedly my test setup uses a >>> static IP configuration rather than DHCP. >>> >>>> diff --git a/drivers/net/rtl8169.c b/drivers/net/rtl8169.c >>> >>> >>>> static void rtl_flush_rx_desc(struct RxDesc *desc) >>>> { >>>> #ifndef CONFIG_SYS_NONCACHED_MEMORY >>>> - flush_cache((unsigned long)desc, sizeof(*desc)); >>>> + unsigned long start = (unsigned long)desc & ~(ARCH_DMA_MINALIGN - >>>> 1); >>>> + unsigned long size = ALIGN(sizeof(*desc), ARCH_DMA_MINALIGN); >>>> + >>>> + flush_cache(start, size); >>>> #endif >>>> } >>>> >>>> @@ -493,21 +496,28 @@ static void rtl_inval_tx_desc(struct TxDesc *desc) >>>> static void rtl_flush_tx_desc(struct TxDesc *desc) >>>> { >>>> #ifndef CONFIG_SYS_NONCACHED_MEMORY >>>> - flush_cache((unsigned long)desc, sizeof(*desc)); >>>> + unsigned long start = (unsigned long)desc & ~(ARCH_DMA_MINALIGN - >>>> 1); >>>> + unsigned long sz = ALIGN(sizeof *desc, ARCH_DMA_MINALIGN); >>>> + >>>> + flush_cache(start, sz); >>>> #endif >>>> } >>> >>> >>> Those two are wrong. Hopefully neither of those changes do anything on >>> Jetson TK1, since CONFIG_SYS_NONCACHED_MEMORY is enabled there. >>> >>> The cache line is likely larger than the individual descriptor size, so >>> rounding up the flush length to a whole cache line will likely end up >>> flushing more descriptors than you want. This will eventually result in SW >>> over-writing a HW update to another descriptor, and so at least lose >>> packets. For this reason, CONFIG_SYS_NONCACHED_MEMORY must be set if the >>> descriptor and cache line sizes don't match, except for systems where IO is >>> fully cache-coherent and hence the cache operations are no-ops. >> >> I agree... it this should (and does) require >> CONFIG_SYS_NONCACHED_MEMORY in that case. This is in the top of this >> driver: >> >> /* >> * Warn if the cache-line size is larger than the descriptor size. In such >> * cases the driver will likely fail because the CPU needs to flush the cache >> * when requeuing RX buffers, therefore descriptors written by the hardware >> * may be discarded. >> * >> * This can be fixed by defining CONFIG_SYS_NONCACHED_MEMORY which will cause >> * the driver to allocate descriptors from a pool of non-cached memory. >> */ >> #if RTL8169_DESC_SIZE < ARCH_DMA_MINALIGN >> #if !defined(CONFIG_SYS_NONCACHED_MEMORY) && \ >> !defined(CONFIG_SYS_DCACHE_OFF) && !defined(CONFIG_X86) >> #warning cache-line size is larger than descriptor size >> #endif >> #endif >> >>>> static void rtl_inval_buffer(void *buf, size_t size) >>>> { >>>> - unsigned long start = (unsigned long)buf & ~(ARCH_DMA_MINALIGN - >>>> 1); >>>> - unsigned long end = ALIGN(start + size, ARCH_DMA_MINALIGN); >>>> + unsigned long end = ALIGN((unsigned long)buf + size, >>>> ARCH_DMA_MINALIGN); >>>> >>>> - invalidate_dcache_range(start, end); >>>> + /* buf is aligned to RTL8169_ALIGN, >>>> + * which is a multiple of ARCH_DMA_ALIGN >>>> + */ >>>> + invalidate_dcache_range((unsigned long)buf, end); >>>> } >>>> >>>> static void rtl_flush_buffer(void *buf, size_t size) >>>> { >>>> - flush_cache((unsigned long)buf, size); >>>> + unsigned long sz = ALIGN(size, ARCH_DMA_MINALIGN); >>>> + >>>> + flush_cache((unsigned long)buf, sz); >>>> } >>> >>> >>> I believe the correct approach is for the caller (network core code) to >>> provide cache-aligned buffers, rather than for each driver to align the >>> start/size when performing cache operations. Again, this is to ensure that >>> cache operations don't affect any other data adjacent to the buffer. Can you >>> see why the network core code isn't using cache-aligned buffers when DM_ETH >>> is enabled? Perhaps DM_ETH isn't the trigger, but just changed something >>> about the memory layout that exposed some other pre-existing issue. >> >> This should already be the case. The transmit buffer is setup like this: >> >> net_tx_packet = &net_pkt_buf[0] + (PKTALIGN - 1); >> net_tx_packet -= (ulong)net_tx_packet % PKTALIGN; >> >> in net/net.c:net_init(). >> >> The recv buffers are defined by the drivers and passed back to the >> core network code. In this case, the misalignment is caused by the >> rtl8169 driver... >> >> rtl8169_init_ring() does this: >> >> tpc->RxBufferRing[i] = &rxb[i * RX_BUF_SIZE]; >> >> ...but the size breaks alignment for all but the first entry: >> >> #define RX_BUF_SIZE 1536 /* Rx Buffer size */ >> >> This should be fixed by defining this instead: >> >> #define RX_BUF_SIZE ALIGN(1536, RTL8169_ALIGN) /* Rx >> Buffer size */ >> >> Also, there is an extra buffer that is memcpy'ed to, and then that is >> passed back to the core net code instead of the actual buffer that was >> recv'd into; I don't know why: >> >> static unsigned char rxdata[RX_BUF_LEN]; > > I also noticed that the code to setup the tx ring buffers are > completely wrong. It's a good thing that NUM_TX_DESC is defined to be > 1. From rtl8169_init_ring(): > > for (i = 0; i < NUM_TX_DESC; i++) { > tpc->Tx_skbuff[i] = &txb[i]; > } > > That would set the buffer not only to be unaligned after the index 0, > but also to be one byte after the previous, thus they would all stomp > on each other. > > It should be: > > tpc->Tx_skbuff[i] = &txb[i * RX_BUF_SIZE]; > > It's probably also worth making another define for TX_BUF_SIZE even if > it is just defined to RX_BUF_SIZE to keep it from looking like a bug.
Essentially this patch was incomplete; only ensuring that the first of the buffers in each ring were aligned: https://patchwork.ozlabs.org/patch/419406/ Cheers, -Joe _______________________________________________ U-Boot mailing list U-Boot@lists.denx.de http://lists.denx.de/mailman/listinfo/u-boot