Hi Alex, On Wed, 23 Aug 2023 at 02:56, Alexander Graf <ag...@csgraf.de> wrote: > > Hey Simon, > > On 22.08.23 20:56, Simon Glass wrote: > > Hi Alex, > > > > On Tue, 22 Aug 2023 at 01:47, Alexander Graf <ag...@csgraf.de> wrote: > >> > >> On 22.08.23 01:03, Simon Glass wrote: > >>> Hi Alex, > >>> > >>> On Mon, 21 Aug 2023 at 16:40, Alexander Graf <ag...@csgraf.de> wrote: > >>>> On 22.08.23 00:10, Simon Glass wrote: > >>>>> Hi Alex, > >>>>> > >>>>> On Mon, 21 Aug 2023 at 14:20, Alexander Graf <ag...@csgraf.de> wrote: > >>>>>> On 21.08.23 21:57, Simon Glass wrote: > >>>>>>> Hi Alex, > >>>>>>> > >>>>>>> On Mon, 21 Aug 2023 at 13:33, Alexander Graf <ag...@csgraf.de> wrote: > >>>>>>>> On 21.08.23 21:11, Simon Glass wrote: > >>>>>>>>> Hi Alper, > >>>>>>>>> > >>>>>>>>> On Mon, 21 Aug 2023 at 07:51, Alper Nebi Yasak > >>>>>>>>> <alpernebiya...@gmail.com> wrote: > >>>>>>>>>> This is a rebase of Alexander Graf's video damage tracking series, > >>>>>>>>>> with > >>>>>>>>>> some tests and other changes. The original cover letter is as > >>>>>>>>>> follows: > >>>>>>>>>> > >>>>>>>>>>> This patch set speeds up graphics output on ARM by a factor of > >>>>>>>>>>> 60x. > >>>>>>>>>>> > >>>>>>>>>>> On most ARM SBCs, we keep the frame buffer in DRAM and map it as > >>>>>>>>>>> cached, > >>>>>>>>>>> but need it accessible by the display controller which reads > >>>>>>>>>>> directly > >>>>>>>>>>> from a later point of consistency. Hence, we flush the frame > >>>>>>>>>>> buffer to > >>>>>>>>>>> DRAM on every change. The full frame buffer. > >>>>>>>>> It should not, see below. > >>>>>>>>> > >>>>>>>>>>> Unfortunately, with the advent of 4k displays, we are seeing > >>>>>>>>>>> frame buffers > >>>>>>>>>>> that can take a while to flush out. This was reported by Da Xue > >>>>>>>>>>> with grub, > >>>>>>>>>>> which happily print 1000s of spaces on the screen to draw a menu. > >>>>>>>>>>> Every > >>>>>>>>>>> printed space triggers a cache flush. > >>>>>>>>> That is a bug somewhere in EFI. > >>>>>>>> Unfortunately not :). You may call it a bug in grub: It literally > >>>>>>>> prints > >>>>>>>> over space characters for every character in its menu that it wants > >>>>>>>> cleared. On every text screen draw. > >>>>>>>> > >>>>>>>> This wouldn't be a big issue if we only flush the reactangle that > >>>>>>>> gets > >>>>>>>> modified. But without this patch set, we're flushing the full DRAM > >>>>>>>> buffer on every u-boot text console character write, which means for > >>>>>>>> every character (as that's the only API UEFI has). > >>>>>>>> > >>>>>>>> As a nice side effect, we speed up the normal U-Boot text console as > >>>>>>>> well with this patch set, because even "normal" text prints that > >>>>>>>> write > >>>>>>>> for example a single line of text on the screen today flush the full > >>>>>>>> frame buffer to DRAM. > >>>>>>> No, I mean that it is a bug that U-Boot (apparently) flushes the cache > >>>>>>> after every character. It doesn't do that for normal character output > >>>>>>> and I don't think it makes sense to do it for EFI either. > >>>>>> I see. Let's trace the calls: > >>>>>> > >>>>>> efi_cout_output_string() > >>>>>> -> fputs() > >>>>>> -> vidconsole_puts() > >>>>>> -> video_sync() > >>>>>> -> flush_dcache_range() > >>>>>> > >>>>>> Unfortunately grub abstracts character backends down to the "print > >>>>>> character" level, so it calls UEFI's sopisticated "output_string" > >>>>>> callback with single characters at a time, which means we do a full > >>>>>> dcache flush for every character that we print: > >>>>>> > >>>>>> https://git.savannah.gnu.org/cgit/grub.git/tree/grub-core/term/efi/console.c#n165 > >>>>>> > >>>>>> > >>>>>>>>>>> This patch set implements the easiest mitigation against this > >>>>>>>>>>> problem: > >>>>>>>>>>> Damage tracking. We remember the lowest common denominator region > >>>>>>>>>>> that was > >>>>>>>>>>> touched since the last video_sync() call and only flush that. The > >>>>>>>>>>> most > >>>>>>>>>>> typical writer to the frame buffer is the video console, which > >>>>>>>>>>> always > >>>>>>>>>>> writes rectangles of characters on the screen and syncs > >>>>>>>>>>> afterwards. > >>>>>>>>>>> > >>>>>>>>>>> With this patch set applied, we reduce drawing a large grub menu > >>>>>>>>>>> (with > >>>>>>>>>>> serial console attached for size information) on an RK3399-ROC > >>>>>>>>>>> system > >>>>>>>>>>> at 1440p from 55 seconds to less than 1 second. > >>>>>>>>>>> > >>>>>>>>>>> Version 2 also implements VIDEO_COPY using this mechanism, > >>>>>>>>>>> reducing its > >>>>>>>>>>> overhead compared to before as well. So even x86 systems should > >>>>>>>>>>> be faster > >>>>>>>>>>> with this now :). > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> Alternatives considered: > >>>>>>>>>>> > >>>>>>>>>>> 1) Lazy sync - Sandbox does this. It only calls > >>>>>>>>>>> video_sync(true) ever > >>>>>>>>>>> so often. We are missing timers to do this generically. > >>>>>>>>>>> > >>>>>>>>>>> 2) Double buffering - We could try to identify whether > >>>>>>>>>>> anything changed > >>>>>>>>>>> at all and only draw to the FB if it did. That would > >>>>>>>>>>> require > >>>>>>>>>>> maintaining a second buffer that we need to scan. > >>>>>>>>>>> > >>>>>>>>>>> 3) Text buffer - Maintain a buffer of all text printed on > >>>>>>>>>>> the screen with > >>>>>>>>>>> respective location. Don't write if the old and new > >>>>>>>>>>> character are > >>>>>>>>>>> identical. This would limit applicability to text only > >>>>>>>>>>> and is an > >>>>>>>>>>> optimization on top of this patch set. > >>>>>>>>>>> > >>>>>>>>>>> 4) Hash screen lines - Create a hash (sha256?) over every > >>>>>>>>>>> line when it > >>>>>>>>>>> changes. Only flush when it does. I'm not sure if this > >>>>>>>>>>> would waste > >>>>>>>>>>> more time, memory and cache than the current approach. > >>>>>>>>>>> It would make > >>>>>>>>>>> full screen updates much more expensive. > >>>>>>>>> 5) Fix the bug mentioned above? > >>>>>>>>> > >>>>>>>>>> Changes in v5: > >>>>>>>>>> - Add patch "video: test: Split copy frame buffer check into a > >>>>>>>>>> function" > >>>>>>>>>> - Add patch "video: test: Support checking copy frame buffer > >>>>>>>>>> contents" > >>>>>>>>>> - Add patch "video: test: Test partial updates of hardware frame > >>>>>>>>>> buffer" > >>>>>>>>>> - Use xstart, ystart, xend, yend as names for damage region > >>>>>>>>>> - Document damage struct and fields in struct video_priv comment > >>>>>>>>>> - Return void from video_damage() > >>>>>>>>>> - Fix undeclared priv error in video_sync() > >>>>>>>>>> - Drop unused headers from video-uclass.c > >>>>>>>>>> - Use IS_ENABLED() instead of CONFIG_IS_ENABLED() > >>>>>>>>>> - Call video_damage() also in video_fill_part() > >>>>>>>>>> - Use met->baseline instead of priv->baseline > >>>>>>>>>> - Use fontdata->height/width instead of VIDEO_FONT_HEIGHT/WIDTH > >>>>>>>>>> - Update console_rotate.c video_damage() calls to pass video tests > >>>>>>>>>> - Remove mention about not having minimal damage for > >>>>>>>>>> console_rotate.c > >>>>>>>>>> - Add patch "video: test: Test video damage tracking via > >>>>>>>>>> vidconsole" > >>>>>>>>>> - Document new vdev field in struct efi_gop_obj comment > >>>>>>>>>> - Remove video_sync_copy() also from video_fill(), > >>>>>>>>>> video_fill_part() > >>>>>>>>>> - Fix memmove() calls by removing the extra dev argument > >>>>>>>>>> - Call video_sync() before checking copy_fb in video tests > >>>>>>>>>> - Imply VIDEO_DAMAGE for video drivers instead of selecting it > >>>>>>>>>> - Imply VIDEO_DAMAGE also for VIDEO_TIDSS > >>>>>>>>>> > >>>>>>>>>> v4: > >>>>>>>>>> https://lore.kernel.org/all/20230103215004.22646-1-ag...@csgraf.de/ > >>>>>>>>>> > >>>>>>>>>> Changes in v4: > >>>>>>>>>> - Move damage clear to patch "dm: video: Add damage tracking API" > >>>>>>>>>> - Simplify first damage logic > >>>>>>>>>> - Remove VIDEO_DAMAGE default for ARM > >>>>>>>>>> - Skip damage on EfiBltVideoToBltBuffer > >>>>>>>>>> - Add patch "video: Always compile cache flushing code" > >>>>>>>>>> - Add patch "video: Enable VIDEO_DAMAGE for drivers that need it" > >>>>>>>>>> > >>>>>>>>>> v3: > >>>>>>>>>> https://lore.kernel.org/all/20221230195828.88134-1-ag...@csgraf.de/ > >>>>>>>>>> > >>>>>>>>>> Changes in v3: > >>>>>>>>>> - Adapt to always assume DM is used > >>>>>>>>>> - Adapt to always assume DM is used > >>>>>>>>>> - Make VIDEO_COPY always select VIDEO_DAMAGE > >>>>>>>>>> > >>>>>>>>>> v2: > >>>>>>>>>> https://lore.kernel.org/all/20220609225921.62462-1-ag...@csgraf.de/ > >>>>>>>>>> > >>>>>>>>>> Changes in v2: > >>>>>>>>>> - Remove ifdefs > >>>>>>>>>> - Fix ranges in truetype target > >>>>>>>>>> - Limit rotate to necessary damage > >>>>>>>>>> - Remove ifdefs from gop > >>>>>>>>>> - Fix dcache range; we were flushing too much before > >>>>>>>>>> - Add patch "video: Use VIDEO_DAMAGE for VIDEO_COPY" > >>>>>>>>>> > >>>>>>>>>> v1: > >>>>>>>>>> https://lore.kernel.org/all/20220606234336.5021-1-ag...@csgraf.de/ > >>>>>>>>>> > >>>>>>>>>> Alexander Graf (9): > >>>>>>>>>> dm: video: Add damage tracking API > >>>>>>>>>> dm: video: Add damage notification on display fills > >>>>>>>>>> vidconsole: Add damage notifications to all vidconsole > >>>>>>>>>> drivers > >>>>>>>>>> video: Add damage notification on bmp display > >>>>>>>>>> efi_loader: GOP: Add damage notification on BLT > >>>>>>>>>> video: Only dcache flush damaged lines > >>>>>>>>>> video: Use VIDEO_DAMAGE for VIDEO_COPY > >>>>>>>>>> video: Always compile cache flushing code > >>>>>>>>>> video: Enable VIDEO_DAMAGE for drivers that need it > >>>>>>>>>> > >>>>>>>>>> Alper Nebi Yasak (4): > >>>>>>>>>> video: test: Split copy frame buffer check into a function > >>>>>>>>>> video: test: Support checking copy frame buffer contents > >>>>>>>>>> video: test: Test partial updates of hardware frame buffer > >>>>>>>>>> video: test: Test video damage tracking via vidconsole > >>>>>>>>>> > >>>>>>>>>> arch/arm/mach-omap2/omap3/Kconfig | 1 + > >>>>>>>>>> arch/arm/mach-sunxi/Kconfig | 1 + > >>>>>>>>>> drivers/video/Kconfig | 26 +++ > >>>>>>>>>> drivers/video/console_normal.c | 27 ++-- > >>>>>>>>>> drivers/video/console_rotate.c | 94 +++++++---- > >>>>>>>>>> drivers/video/console_truetype.c | 37 +++-- > >>>>>>>>>> drivers/video/exynos/Kconfig | 1 + > >>>>>>>>>> drivers/video/imx/Kconfig | 1 + > >>>>>>>>>> drivers/video/meson/Kconfig | 1 + > >>>>>>>>>> drivers/video/rockchip/Kconfig | 1 + > >>>>>>>>>> drivers/video/stm32/Kconfig | 1 + > >>>>>>>>>> drivers/video/tegra20/Kconfig | 1 + > >>>>>>>>>> drivers/video/tidss/Kconfig | 1 + > >>>>>>>>>> drivers/video/vidconsole-uclass.c | 16 -- > >>>>>>>>>> drivers/video/video-uclass.c | 190 > >>>>>>>>>> ++++++++++++---------- > >>>>>>>>>> drivers/video/video_bmp.c | 7 +- > >>>>>>>>>> include/video.h | 59 +++---- > >>>>>>>>>> include/video_console.h | 52 ------ > >>>>>>>>>> lib/efi_loader/efi_gop.c | 7 + > >>>>>>>>>> test/dm/video.c | 256 > >>>>>>>>>> ++++++++++++++++++++++++------ > >>>>>>>>>> 20 files changed, 483 insertions(+), 297 deletions(-) > >>>>>>>>> It is good to see this tidied up into something that can be applied! > >>>>>>>>> > >>>>>>>>> I am unsure what is going on with the EFI performance, though. It > >>>>>>>>> should not flush the cache after every character, only after a new > >>>>>>>>> line. Is there something wrong in here? If so, we should fix that > >>>>>>>>> bug > >>>>>>>>> first and it should be patch 1 of this series. > >>>>>>>> Before I came up with this series, I was trying to identify the UEFI > >>>>>>>> bug > >>>>>>>> in question as well, because intuition told me surely this is a bug > >>>>>>>> in > >>>>>>>> UEFI :). Turns out it really isn't this time around. > >>>>>>> I don't mean a bug in UEFI, I mean a bug in U-Boot's EFI > >>>>>>> implementation. Where did you look for the bug? > >>>>>> The "real" bug is in grub. But given that it's reasonably simple to > >>>>>> work > >>>>>> around in U-Boot and even with it "fixed" in grub we would still see > >>>>>> performance benefits from flushing only parts of the screen, I think > >>>>>> it's worth living with the grub deficiency. > >>>>> OK thanks for digging into it. I suggest we add a param to > >>>>> vidconsole_puts() to tell it whether to sync or not, then the EFI code > >>>>> can indicate this and try to be a bit smarter about it. > >>>> It doesn't know when to sync either. From its point of view, any > >>>> "console output" could be the last one. There is no API in UEFI that > >>>> says "please flush console output now". > >>> Yes, I understand. I was not suggesting we were missing an API. But > >>> some sort of heuristic would do, e.g. only flush on a newline, flush > >>> every 50 chars, etc. > >> I can't think of any heuristic that would reliably work. Relevant for > >> this conversation, UEFI provides 2 calls: > >> > >> * Write string to screen (efi_cout_output_string) > >> * Set text cursor position to X, Y (efi_cout_set_cursor_position) > >> > >> It's perfectly legal for a UEFI application to do something like > >> > >> efi_cout_set_cursor_position(10, 10); > >> efi_cout_output_string("f"); > >> efi_cout_output_string("o"); > >> efi_cout_output_string("o") ; > >> > >> to update contents of a virtual text box on the screen. Where in this > >> chain of events would we call video_sync(), but on every call to > >> efi_cout_output_string()? > > Actually U-Boot has the same problem, but we have managed to work out > > something. > > > U-Boot as a code base has a much easier stance: It can add APIs when it > needs them in places that require them. With UEFI (as well as the U-Boot > native API), we're stuck with what's there. > > I also don't understand what you mean by "we have managed to work out > something". This patch set is not a UEFI fix - it fixes generic U-Boot > behavior and speeds up non-UEFI boots as well. The improvement there is > just not as impressive as with grub :).
We are still not quite on the same page... U-Boot does have video_sync() but it doesn't know when to call it. If it does not call it, then any amount of single-threaded code can run after that, which may update the framebuffer. In other words, U-Boot is in exactly the same boat as UEFI. It has to decide whether to call video_sync() based on some sort of heuristic. That is the only point I am trying to make here. Does that make sense? > > > > I do think it is still to flush the cache on every char. I suspect you > > will find that even a simple heuristic like I mentioned would be good > > enough. > > > > Also I notice that EFI calls notify? all the time, so U-Boot probably > > does have the ability to sync the video every 10ms if we wanted to. > > > I fail to see how invalidating the frame buffer for the screen every > 10ms is any better than doing flush+invalidate operations only on screen > areas that changed? It's more fragile, more difficult to understand and > also significantly more expensive given that most of the time very > little on the screen actually changes. I am not suggesting it is better, at all. I am just trying to explain that the U-Boot EFI implementation should not be calling video_sync() after every character, before or after this series. > > > > > > It seems from this discussion that we have made great the enemy of the good. > > > I agree. Damage tracking in this patch set is elegant, simple, > predictable, low overhead and basically just abstracts the video copy > code path to a generic solution. All while it pretty much solves the > issue for good. I don't understand what's not to like about it :) I think it is a really nice feature and I hope it can be applied soon. Regards, Simon