On Wed, 20 Jul 2016 21:49:57 +0200 Christer Weinigel <chris...@weinigel.se> said:
> Hi, > > On 07/20/2016 01:04 AM, Carsten Haitzler (The Rasterman) wrote: > >> With this I managed to get a desktop and was unable to start > >> wayland-terminal. Redrawing of the graphics felt fairly snappy, but the > >> lag from pressing a key on the keyboard until a character showed up in > >> the terminal was slow, probably between a quarter to half a second. > >> > >> So my question is if this is the performance I should expect with weston > >> on a 400MHz ARM9 and a dumb framebuffer? Have I done something stupid > >> and there are easy ways to speed it up? > > > when you say redraw is snappy... that implies that output is fast. so time > > from deciding to render and update and it appearing is very short. but you > > seem to have serious input lag which implies to me that it has nothing to > > do with your cpu speed and is something else deeper and more involved. time > > to trace things and see how they go. > > I put up a short video here: > > http://zoo.weinigel.se/misc/2016-07-20-213549.webm that's not snappy. :) startup takes quite a while. but after that moving the terminal window around is maybe getting you 6-7fps or so. > On the framebuffer I don't perceive any lag at all between a keypress > and the character appearing on the screen. > > With weston-terminal running I can drag the window around and even > though it's not very fast and there's a bit of tearing it isn't too bad. > The response when dragging feels ok. Keypresses feel laggy even though > mouse motion doesn't, but I'm not sure if that's because I don't notice > the lag when moving the mouse or if it is a real difference. well they are done by different things. the move will be done directly by weston itself. it will be asked to begin a window move by the client and then just do it itself. render the changes. key events have a different path. they go to client, client handles it, draws new frame, then weston has to update screen with that new frame. it seems to be either weston-terminal is just slow at drawing there and thus is ending up taking a while to draw, add another 200ms or so for weston itself and thats probably what's going on. weston reads input, sends 1 or more key events to client. client gets input now does some updates/rendering (let's say takes 200ms assuming weston terminal is slow-ish at rendering). let's now say client sends update buffer to weston. weston now gets it, spends 200ms rendering, then reads buffered input, sends backto client (it may have sent it before), but weston will be either rendering a frame (takes 200ms or a bit less) or sending events. not both. that means at least some events could take 600ms to come back to the screen (almost half a second) because weston got blocks then client renders, then come back to screen. so maybe 500ms on average. half a second. i think rendering is slow and due to the above it just adds latency to the point where you see it easily. you only have a single cpu. any cpu time used up one place cannot be used elsewhere. no multilpe cores. :) that's my guess. weston is either reading input + sending, or drawing, and the big blobs of time spent drawing mean it's not reading and sending. so that adds UP to ~200ms THEN client gets these. client may be still drawing a previous frame, so doesn't respond for a little bit. let's say 100ms. then client draws. let's say 100ms, then client sends new frame over to compositor. compositor gets frame, begins draw. now 200ms more. NOW you see what you just typed. 600ms later. more or less. which is about what it looks like. when moving a window, weston gets mouse events, weston redraws, repeat. so 200ms lag. speed up the drawing or allow drawing to happen in parallel and you're good. remember weston is the SAMPLE compositor. it will not have been tuned to run ultra-fast on your setup. you likely have a 16bpp display but what's actually going on is clients are rendering in 32bpp so taking longer to render that they would natively (like the text console), and then weston is likely rendering in 32bpp too... THEN it's down-converting to 16bpp for display. none of that is free. :) you will likely not find much support these days that doesn't involve down-conversion as everyone is handling alpha and thus 32bpp (yes you can do 16bpp+alpha mask for example, or pack argb 4444 into 16bpp and other imaginative ways of getting it). dropping the whole pipeline down to something like 16bpp+masks and a very carefully tuned pipeline would help. (the reason i say 16bpp + masks is you can do a memcpy for the 16bpp data direct to memory and since this doesn't convert it likely will be 2-3 times faster - on the compositor fb side. on the client side the mask can be pre-computed once for the window then just render 16bpp content, and with opaque regions - since all the drawing happens inside those, the compositor can skip blending entirely for regions inside the opaque rect and just memcpy. this would involve defining a rgb565 + mask format for a buffer and have both sides understand it, generate and read from it correctly - my guess is that the whole update process would speed up dramatically if carefully hand optimized and kept minimal like above and your latency will drop significantly as a result - down to 1/3rd or 1/4 of what it is - we used to have a dedicated 16bpp rendering backend that did just the above. rgb565+8bpp masks for alpha (maybe should have used 4bpp but hey) and we did this because in those days on things like the n770, n800, openmoko etc. devices they used soc's very much like youre or exactly the same (the openmoko freerunner also used a samsung arm9 24xx at about your clockrate), and having such a back end really got good speedups... BUT it was a pain to maintain. a major pain. it was a whole parallel software rendering pipeline just for this and we dropped it in the end to stop maintaining it as it's just not worth it anymore) do you want to confirm? don't just type. click and drag in the terminal to select things. it should be laggy too. install more apps from full toolkits (efl, gtk+, qt) and test them. scroll content around. etc. you should see similar kind of lag. > I tried to do a strace of weston-terminal, but it was a bit painful, it > reads every file it can find in /usr/share/icons/default/cursors/ when > it starts so strace took forever before the terminal would even show up. > > And for trying to do more advanced tracing, I don't quite know where to > start. Are there any knobs in the source to do things such as dump > timestamps for messages between the server and client? > > /Christer > _______________________________________________ > wayland-devel mailing list > wayland-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/wayland-devel -- ------------- Codito, ergo sum - "I code, therefore I am" -------------- The Rasterman (Carsten Haitzler) ras...@rasterman.com _______________________________________________ wayland-devel mailing list wayland-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/wayland-devel