On 08/18/2014 02:00 AM, Thierry Reding wrote:
From: Thierry Reding <tred...@nvidia.com>

This series attempts to fix a long-standing problem in the rtl8169 driver
(though the same problem may exist in other drivers as well). Let me first
explain what exactly the issue is:

The rtl8169 driver provides a set of RX and TX descriptors for the device to
use. Once they're set up, the device is told about their location so that it
can fetch the descriptors using DMA. The device will also write packet state
back into these descriptors using DMA. For this to work properly, whenever a
driver needs to access these descriptors it needs to invalidate the D-cache
line(s) associated with them. Similarly when changes to the descriptor have
been made by the driver, the cache lines need to be flushed to make sure the
changes are visible to the device.

The descriptors are 16 bytes in size. This causes problems when used on CPUs
that have a cache-line size that is larger than 16 bytes. One example is the
NVIDIA Tegra124 which has 64-byte cache-lines. That means that 4 descriptors
fit into a single cache-line. So whenever the driver flushes a cache-line it
has the potential to discard changes made to another descriptor by the DMA
device. One typical symptom is that large transfers over TFTP will often not
complete and hang somewhere midway because a device marked a packet received
but the driver flushing the cache and causing the packet to be lost.

Since the descriptors need to be consecutive in memory, I don't see a way to
fix this other than to use uncached memory. Therefore the solution proposed
in this patch series is to introduce a mechanism in U-Boot to allow a driver
to allocate from a pool of uncached memory. Currently an implementation is
provided only for ARM v7. The idea is that a region (of user-definable size)
immediately below (taking into account architecture-specific alignment
restrictions) the malloc() area is mapped uncacheable in the MMU. A driver
can use the new noncached_alloc() function to allocate a chunk of memory
from this pool dynamically for buffers that it can't or doesn't want to do
any explicit cache-maintainance on, yet needs to be shared with DMA devices.

Patches 1-3 are minor preparatory work. Patch 1 cleans up some coding style
issues in the ARM v7 cache code and patch 2 uses more future-proof types for
the mmu_set_region_dcache_behaviour() function arguments. Patch 3 is purely
for debugging purposes. It will print out the region used by malloc() when
DEBUG is enabled. This can be useful to see where the malloc() region is in
the memory map (compared to the noncached region introduced in a later patch
for example).

Patch 4 implements the noncached API for ARM v7. It obtains the start of the
malloc() area and places the noncached region immediately below it so that
noncached_alloc() can allocate from it. During boot, the noncached area will
be set up immediately after malloc().

Patch 5 enables noncached memory for all Tegra boards. It uses a 1 MiB chunk
which should be plenty (it's also the minimum on ARM v7 because it matches
the MMU section size and therefore the granularity at which U-Boot can set
the cacheable attributes).

If LPAE were to be enabled, the minimum would be 2MiB, but I suppose we can deal with that if/when the time comes.
_______________________________________________
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot

Reply via email to