On Fri, Jun 26, 2020 at 07:42:50AM -0700, jo...@armadilloaerospace.com wrote: > Optimized 32 bit character rendering with unrolled rows and pairwise > foreground / background pixel rendering. > > If it weren't for the 5x8 font, I would have just assumed everything > was an even width and made the fallback path also pairwise. > > In isolation, the 16x32 character case got 2x faster, but that wasn't > a huge real world speedup where the space rendering that was already > at memory bandwidth limits accounted for most of the character > rendering time. However, in combination with the previous fast > conditional console scrolling that removes most of the space rendering, > it becomes significant.
On my Ryzen desktop with radeondrm, I don't see any improvements, the rasops_vcons_copyrows() optimizations seems to have made character plotting fast enough so that it's not a bottleneck anymore, which is definitely great. cpu0: AMD Ryzen 7 2700 Eight-Core Processor, 3394.18 MHz, 17-08-02 radeondrm0 at pci8 dev 0 function 0 "ATI Radeon HD 6450" rev 0x00 radeondrm0: 1920x1080, 32bpp On my T450 however, this diff makes cat'ing my usual test file [1] up to 20% faster with the default 12x24 font on the built-in 1600x900 screen, which I think is significant enough for the diff to go in. cpu0: Intel(R) Core(TM) i5-5200U CPU @ 2.20GHz, 2095.47 MHz, 06-3d-04 inteldrm0 at pci0 dev 2 function 0 "Intel HD Graphics 5500" rev 0x09 drm0 at inteldrm0 inteldrm0: 1600x900, 32bpp On my Cubieboard2 (armv7) I didn't notice any meaningful difference, which I assume is to be expected on a 32-bit platform. I suppose it's also reasonable to assume other 32-bit platforms (i386, hppa, macppc) will not see any regression beyond noise level? Anyone willing to OK this diff? [1] https://norvig.com/big.txt