Le Mercredi, Septembre 01, 2021 10:24 CEST, François Legal via Xenomai <[email protected]> a écrit:
> Le Mardi, Août 31, 2021 19:37 CEST, Philippe Gerum <[email protected]> a écrit: > > > > > François Legal <[email protected]> writes: > > > > > Le Vendredi, Août 27, 2021 16:36 CEST, Philippe Gerum <[email protected]> > > > a écrit: > > > > > >> > > >> François Legal <[email protected]> writes: > > >> > > >> > Le Vendredi, Août 27, 2021 15:54 CEST, Philippe Gerum > > >> > <[email protected]> a écrit: > > >> > > > >> >> > > >> >> François Legal <[email protected]> writes: > > >> >> > > >> >> > Le Vendredi, Août 27, 2021 15:01 CEST, Philippe Gerum > > >> >> > <[email protected]> a écrit: > > >> >> > > > >> >> >> > > >> >> >> François Legal via Xenomai <[email protected]> writes: > > >> >> >> > > >> >> >> > Hello, > > >> >> >> > > > >> >> >> > working on a zynq7000 target (arm cortex a9), we have a > > >> >> >> > peripheral that generates loads of data (many kbytes per ms). > > >> >> >> > > > >> >> >> > We would like to move that data, directly from the peripheral > > >> >> >> > memory (the OCM of the SoC) directly to our RT application user > > >> >> >> > memory using DMA. > > >> >> >> > > > >> >> >> > For one part of the data, we would like the DMA to de interlace > > >> >> >> > that data while moving it. We figured out, the PL330 peripheral > > >> >> >> > on the SoC should be able to do it, however, we would like, as > > >> >> >> > much as possible, to retain the use of one or two channels of > > >> >> >> > the PL330 to plain linux non RT use (via dmaengine). > > >> >> >> > > > >> >> >> > My first attempt would be to enhance the dmaengine API to add RT > > >> >> >> > API, then implement the RT API calls in the PL330 driver. > > >> >> >> > > > >> >> >> > What do you think of this approach, and is it achievable at all > > >> >> >> > (DMA directly to user land memory and/or having DMA channels > > >> >> >> > exploited by xenomai and other by linux) ? > > >> >> >> > > > >> >> >> > Thanks in advance > > >> >> >> > > > >> >> >> > François > > >> >> >> > > >> >> >> As a starting point, you may want to have a look at this document: > > >> >> >> https://evlproject.org/core/oob-drivers/dma/ > > >> >> >> > > >> >> >> This is part of the EVL core documentation, but this is actually a > > >> >> >> Dovetail feature. > > >> >> >> > > >> >> > > > >> >> > Well, that's quite what I want to do, so this is very good news > > >> >> > that it is already available in the future. However, I need it > > >> >> > through the ipipe right now, but I guess the process stays the same > > >> >> > (through patching the dmaengine API and the DMA engine driver). > > >> >> > > > >> >> > I would guess the modifications to the DMA engine driver would be > > >> >> > then easily ported to dovetail ? > > >> >> > > > >> >> > > >> >> Since they should follow the same pattern used for the controllers > > >> >> Dovetail currently supports, I think so. You should be able to > > >> >> simplify > > >> >> the code when porting it Dovetail actually. > > >> >> > > >> > > > >> > That's what I thought. Thanks a lot. > > >> > > > >> > So now, regarding the "to userland memory" aspect. I guess I will > > >> > somehow have to, in order to make this happen, change the PTE flags to > > >> > make these pages non cacheable (using dma_map_page maybe), but I > > >> > wonder if I have to map the userland pages to kernel space and whether > > >> > or not I have to pin the userland pages in memory (I believe mlockall > > >> > in the userland process does that already) ? > > >> > > > >> > > >> The out-of-band SPI support available from EVL illustrates a possible > > >> implementation. This code [2] implements what is described in this page > > >> [1]. > > >> > > > > > > Thanks for the example. I think what I'm trying to do is a little > > > different from this however. > > > For the records, this is what I do (and that seems to be working) :> > - > > > as soon as user land buffers are allocated, tell the driver to pin the > > > user land buffer pages in memory (with get_user_pages_fast). I'm not sure > > > if this is required, as I think mlockall in the app would already take > > > care of that. > > > - whenever I need to transfer data to the user land buffer, instruct the > > > driver to dma remap those user land pages (with dma_map_page), then > > > instruct the DMA controller of the physical address of these pages. > > > et voilà > > > > > > This seem to work correctly and repeatedly so far. > > > > > > > Are transfers controlled from the real-time stage, and if so, how do you > > deal with cache maintenance between transfers? > > That is my next problem to fix. It seems, as long as I run the test program > in the debugger, displaying the buffer filled by the DMA in GDB, everything > is fine. When GDB get's out of the way, I seem to read data that got in the D > cache before the DMA did the transfer. > I tried adding a flush_dcache_range before trigging the DMA, but it did not > help. > > Any suggestion ? > > Thanks > > François > So I dug deep into the kernel cache management code for my (arm v7) arch, but could not find an answer nor a solution. I now wonder whether or not this (DMA to user land memory) is possible on this arch at all because of what is suggested in [1] even if that's a bit old. I saw that flush_dcache_range on armv7 is quite a noop, I tried with dmac_flush_range (which does the real thing with CP15), passing either the user land virtual address directly or first getting a kernel mapping with kmap_atomic but that did not change anything. I still, most of the time, get the first 2 cache line of data in the user land application wrong after the DMA transfer is done. I'm not sure where to look at next. François > > > > -- > > Philippe. > > [1] https://groups.google.com/g/linux.kernel/c/QONWGX6WJaE
