Re: [PATCH v5 10/13] video: Only dcache flush damaged lines

Alexander Graf Mon, 21 Aug 2023 13:00:10 -0700


On 21.08.23 21:11, Simon Glass wrote:

Hi Alper,

On Mon, 21 Aug 2023 at 07:51, Alper Nebi Yasak <alpernebiya...@gmail.com> wrote:

From: Alexander Graf <ag...@csgraf.de>

Now that we have a damage area tells us which parts of the frame buffer
actually need updating, let's only dcache flush those on video_sync()
calls. With this optimization in place, frame buffer updates - especially
on large screen such as 4k displays - speed up significantly.

Signed-off-by: Alexander Graf <ag...@csgraf.de>
Reported-by: Da Xue <da@libre.computer>
[Alper: Use damage.xstart/yend, IS_ENABLED()]
Co-developed-by: Alper Nebi Yasak <alpernebiya...@gmail.com>
Signed-off-by: Alper Nebi Yasak <alpernebiya...@gmail.com>
---

Changes in v5:
- Use xstart, ystart, xend, yend as names for damage region
- Use IS_ENABLED() instead of CONFIG_IS_ENABLED()

Changes in v2:
- Fix dcache range; we were flushing too much before
- Remove ifdefs

  drivers/video/video-uclass.c | 41 +++++++++++++++++++++++++++++++-----
  1 file changed, 36 insertions(+), 5 deletions(-)

This is a little strange, since flushing the whole cache will only
actually write out data that was actually written (to the display). Is
there a benefit to this patch, in terms of performance?

I'm happy to see you go through the same thought process I went throughwhen writing these: "This surely can't be the problem, can it?". Theanswer is "simple" in hindsight:

Have a look at the ARMv8 cache flush function. It does the only "safe"thing you can expect it to do: Clean+Invalidate to POC because we use itfor multiple things, clearing modified code among others:


ENTRY(__asm_flush_dcache_range)
        mrs     x3, ctr_el0
        ubfx    x3, x3, #16, #4
        mov     x2, #4
        lsl     x2, x2, x3              /* cache line size */

        /* x2 <- minimal cache line size in cache system */
        sub     x3, x2, #1
        bic     x0, x0, x3

1: dc civac, x0 /* clean & invalidate data or unifiedcache */

        add     x0, x0, x2
        cmp     x0, x1
        b.lo    1b
        dsb     sy
        ret
ENDPROC(__asm_flush_dcache_range)

Looking at the "dc civac" call, we find this documentation page fromARM:https://developer.arm.com/documentation/ddi0601/2022-03/AArch64-Instructions/DC-CIVAC--Data-or-unified-Cache-line-Clean-and-Invalidate-by-VA-to-PoC

This says we're writing any dirtyness of this cache line up to the POCand then invalidate (remove the cache line) also up to POC. That meanswhen you look at a typical SBC, this will either be L2 or system levelcache. Every read afterwards needs to go and pull it all the way back toL1 to modify it (or not) on the next character write and then flush itagain.

Even worse: Because of the invalidate, we may even evict it from cachesthat the display controller uses to read the frame buffer. So dependingon the SoC's cache topology and implementation, we may force the displaycontroller to refetch the full FB content on its next screen refresh cycle.

I faintly remember that I tried to experiment with CVAC instead to onlyflush without invalidating. I don't fully remember the results anymorethough. I believe CVAC just behaved identical to CIVAC on the A53platform I was working on. And then I looked at Cortex-A53 errata like[1] and just accepted that doing anything but restricting the flushingrange is a waste of time :)



Alex

[1]https://patchwork.kernel.org/project/xen-devel/patch/1462466065-30212-14-git-send-email-julien.gr...@arm.com/

Re: [PATCH v5 10/13] video: Only dcache flush damaged lines

Reply via email to