On Tue, 2008-09-16 at 20:03 +0300, Daniel Stone wrote: > On Tue, Sep 16, 2008 at 10:10:20AM -0400, Adam Jackson wrote: > > But from a strict performance standpoint, threading > > just isn't a win. Anything the X server's doing that takes material CPU > > time is simply a bug. > > Hmm. Even enforcing fairness between clients? If you have a hostile > client, you've already lost, but we have a lot of crap clients already > (hello Gecko), so. It would also presumably drop the mean/mode > latencies while having pretty much no impact on the others: if you have > one thread waiting on a GetImage and thus migration back to system > memory, your other clients can still push their trivial rendering to the > GPU and go back to sleeping. > > I will admit that this paragraph has had no prior thought, and could > probably be swiftly proven wrong. YMMV.
I could believe a fairness argument here, but I'd like to see better numbers first on how often clients block on the server, and what they're waiting for when they do. Project for anyone reading this thread: instrument the scheduler such that when it punishes a client, it records both the last thing that client was doing, and the number of clients now in the wait queue. Dump to log, run a desktop for a few days, then go do statistics. > > [*] Except embedded stuff, but how often is that both multicore _and_ > > gpu-less. > > Not really. We're getting to the point of seeing multicore in consumer > products, but the GPUs there are still too power-hungry to want to base > a Render implementation on. Of course, we're still pretty much in the > first iteration of the current generation of those GPUs, so hopefully > they can push the power envelope quite aggressively lower, but for a > couple of years at least, we'll have multicore + effectively GPU-less, > in platforms where latency is absolutely unacceptable. ARM, you're so weird. Well, okay, there's at least two tactics you could use here. We could either go to aggressive threading like in MTX, but that's not a small project and I think the ping-pong latency from bouncing the locks around will offset any speed win from parallelising rendering. You can mitigate some of that by trying to keep clients pinned to threads and hope the kernel pins threads to cores, but atoms and root window properties and cliplist manipulation will still knock all your locks around... so you might improve fairness, but at the cost of best-case latency. Or, we keep some long-lived rendering threads in pixman, and chunk rendering up at the last instant. I still contend that software rendering is the only part of the server's life that should legitimately take significant time. If we're going to thread to solve that problem, then keep the complexity there, not up in dispatch. Still, I'm kind of dismayed the GPU needs that much power. All we need is one texture unit. I have to imagine the penalty for doing it in software outweighs the additional idle current from a braindead alpha blender... - ajax
signature.asc
Description: This is a digitally signed message part
_______________________________________________ xorg mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/xorg
