Re: Doing DMA from peripheral to userland memory

François Legal via Xenomai Thu, 02 Sep 2021 09:42:11 -0700

Le Mercredi, Septembre 01, 2021 10:24 CEST, François Legal via Xenomai 
<[email protected]> a écrit:


> Le Mardi, Août 31, 2021 19:37 CEST, Philippe Gerum <[email protected]> a écrit:
>
> >
> > François Legal <[email protected]> writes:
> >
> > > Le Vendredi, Août 27, 2021 16:36 CEST, Philippe Gerum <[email protected]> 
> > > a écrit:
> > >
> > >>
> > >> François Legal <[email protected]> writes:
> > >>
> > >> > Le Vendredi, Août 27, 2021 15:54 CEST, Philippe Gerum 
> > >> > <[email protected]> a écrit:
> > >> >
> > >> >>
> > >> >> François Legal <[email protected]> writes:
> > >> >>
> > >> >> > Le Vendredi, Août 27, 2021 15:01 CEST, Philippe Gerum 
> > >> >> > <[email protected]> a écrit:
> > >> >> >
> > >> >> >>
> > >> >> >> François Legal via Xenomai <[email protected]> writes:
> > >> >> >>
> > >> >> >> > Hello,
> > >> >> >> >
> > >> >> >> > working on a zynq7000 target (arm cortex a9), we have a 
> > >> >> >> > peripheral that generates loads of data (many kbytes per ms).
> > >> >> >> >
> > >> >> >> > We would like to move that data, directly from the peripheral 
> > >> >> >> > memory (the OCM of the SoC) directly to our RT application user 
> > >> >> >> > memory using DMA.
> > >> >> >> >
> > >> >> >> > For one part of the data, we would like the DMA to de interlace 
> > >> >> >> > that data while moving it. We figured out, the PL330 peripheral 
> > >> >> >> > on the SoC should be able to do it, however, we would like, as 
> > >> >> >> > much as possible, to retain the use of one or two channels of 
> > >> >> >> > the PL330 to plain linux non RT use (via dmaengine).
> > >> >> >> >
> > >> >> >> > My first attempt would be to enhance the dmaengine API to add RT 
> > >> >> >> > API, then implement the RT API calls in the PL330 driver.
> > >> >> >> >
> > >> >> >> > What do you think of this approach, and is it achievable at all 
> > >> >> >> > (DMA directly to user land memory and/or having DMA channels 
> > >> >> >> > exploited by xenomai and other by linux) ?
> > >> >> >> >
> > >> >> >> > Thanks in advance
> > >> >> >> >
> > >> >> >> > François
> > >> >> >>
> > >> >> >> As a starting point, you may want to have a look at this document:
> > >> >> >> https://evlproject.org/core/oob-drivers/dma/
> > >> >> >>
> > >> >> >> This is part of the EVL core documentation, but this is actually a
> > >> >> >> Dovetail feature.
> > >> >> >>
> > >> >> >
> > >> >> > Well, that's quite what I want to do, so this is very good news 
> > >> >> > that it is already available in the future. However, I need it 
> > >> >> > through the ipipe right now, but I guess the process stays the same 
> > >> >> > (through patching the dmaengine API and the DMA engine driver).
> > >> >> >
> > >> >> > I would guess the modifications to the DMA engine driver would be 
> > >> >> > then easily ported to dovetail ?
> > >> >> >
> > >> >>
> > >> >> Since they should follow the same pattern used for the controllers
> > >> >> Dovetail currently supports, I think so. You should be able to 
> > >> >> simplify
> > >> >> the code when porting it Dovetail actually.
> > >> >>
> > >> >
> > >> > That's what I thought. Thanks a lot.
> > >> >
> > >> > So now, regarding the "to userland memory" aspect. I guess I will 
> > >> > somehow have to, in order to make this happen, change the PTE flags to 
> > >> > make these pages non cacheable (using dma_map_page maybe), but I 
> > >> > wonder if I have to map the userland pages to kernel space and whether 
> > >> > or not I have to pin the userland pages in memory (I believe mlockall 
> > >> > in the userland process does that already) ?
> > >> >
> > >>
> > >> The out-of-band SPI support available from EVL illustrates a possible
> > >> implementation. This code [2] implements what is described in this page
> > >> [1].
> > >>
> > >
> > > Thanks for the example. I think what I'm trying to do is a little 
> > > different from this however.
> > > For the records, this is what I do (and that seems to be working) :> > - 
> > > as soon as user land buffers are allocated, tell the driver to pin the 
> > > user land buffer pages in memory (with get_user_pages_fast). I'm not sure 
> > > if this is required, as I think mlockall in the app would already take 
> > > care of that.
> > > - whenever I need to transfer data to the user land buffer, instruct the 
> > > driver to dma remap those user land pages (with dma_map_page), then 
> > > instruct the DMA controller of the physical address of these pages.
> > > et voilà
> > >
> > > This seem to work correctly and repeatedly so far.
> > >
> >
> > Are transfers controlled from the real-time stage, and if so, how do you
> > deal with cache maintenance between transfers?
>
> That is my next problem to fix. It seems, as long as I run the test program 
> in the debugger, displaying the buffer filled by the DMA in GDB, everything 
> is fine. When GDB get's out of the way, I seem to read data that got in the D 
> cache before the DMA did the transfer.
> I tried adding a flush_dcache_range before trigging the DMA, but it did not 
> help.
>
> Any suggestion ?
>
> Thanks
>
> François
>

So I dug deep into the kernel cache management code for my (arm v7) arch, but 
could not find an answer nor a solution.
I now wonder whether or not this (DMA to user land memory) is possible on this 
arch at all because of what is suggested in [1] even if that's a bit old.

I saw that flush_dcache_range on armv7 is quite a noop, I tried with 
dmac_flush_range (which does the real thing with CP15), passing either the user 
land virtual address directly or first getting a kernel mapping with 
kmap_atomic but that did not change anything. I still, most of the time, get 
the first 2 cache line of data in the user land application wrong after the DMA 
transfer is done.

I'm not sure where to look at next.

François

> >
> > --
> > Philippe.
>
>

 [1] https://groups.google.com/g/linux.kernel/c/QONWGX6WJaE

Re: Doing DMA from peripheral to userland memory

Reply via email to