RE: X11DRV_LineTo optimization ?
> > Hmm. World transforms is a NT only thing so few applications > > are likely to use it. Making all application pay the price > > of that is probably not a good idea. > > Ture, few applications use it. However the cost is really quite low > (which makes this whole thread rather academic). XDrawLine is > probably one of the simplest of the X11 drawing primitives and even > with this I doubt whether optimizing out a few mults is going to make > any real difference in the total time. More complicated primitives > (Arcs etc.) will spend even longer in the XServer so the gain is even > less. I haven't even mentioned BitBlt and friends. [My reply to Marcus doesn't seem to have made it to the list yet for some reason. Oh well, I repeat it again] You have missed the fact that the X protocol is asyncronous. Sure on a single processor it doesn't matter that much that mainly saves you the cost of quite a few context switch. However on a multiprocessor or with a remote display the faster you can issue the requests to the X server the faster you can go back to do doing whatever the application is doing while the request are processed in parallell. I don't know about how clever the X server is but in theory the faster the X server get the reqests the more time the X server has to preprocess the request queue while waiting for the harware to asyncronously process the translated reqeuests. The longer the queue for not yet processed requests the better opportunity for the X server to do clever things like merging similar objects and clipping object that will be overdrawn later. PS. While I was writing this the message to Marcus made it to the list.
RE: X11DRV_LineTo optimization ?
> > Indeed. Balancing optimization vs maintainabillity > > is very difficult. > > > > As I also said earlier "Premature optimization > > is the root to all evil." > > > > However optimizing function like X11DRV_LineTo isn't > > really that premature it isn't likely to change very much. > > Exactly. X11DRV_LineTo is an example where we can do nearly nothing > to accelerate it, since 99.9% of its time is spent in Xlib, during > communication with the X server and the X server drawing the line. Communication with X Window is something we can't (or shouldn't) do anything about it is outside the scope of Wine. Anyway the X Protocol is asyncronous so if X11DRV_LineTo is called multiple times the requests are queued on the socket without any context switch taking place. Sure it you have to context switch and do the work eventually at least for a single processor machine but it lessens the overhead considerably. Furthermore, if you have a multiprocessor machine or are displaying on a remote display, the other CPU:s or the remote CPU(:s) will read from the socket and process the request is parallell. IIRC somebody posted some sort of benchmarks quite a while ago, where some sort of GDI calls (like line drawing) where 20% slow than Windows 98 on a one CPU but 50% faster with two CPU since Windows 98 couldn't use the other CPU. In short I think you claim about 99.9% is Xlib is simply wrong even in the single processor case and very wrong in the multiprocessor and remote display cases. So I think we are back to optimization vs maintainabillity since I really do think it does a measurable difference speedwise if you do a tight loop of LineTo which is not entirely unlikely for a reasonable application to do. > Look for real hotspots in the overall implementation and real hotspots > in the code first. That is complete unrelated effort. Of course it might pay off more to spend to time to do that but then it is not your time is it. > Example for a real code hotspot is the bitmap depth conversion, > for the implementation might be wine <- wineserver -> wine > communication > overhead. Sure but that is difficult to do something about and is also completly unrelated.
Re: X11DRV_LineTo optimization ?
On Sat, Oct 14, 2000 at 09:47:52PM +0200, Patrik Stridvall wrote: > > Please before we start optimizing can we at least make it work > > correctly :) All of these XLPTODP all completely broken if we have a > > world transform set up that involves anything less trivial than a > > translate/scale. We're going to end up using something like LPtoDP > > here instead. By all means optimize, all I'm saying is that this > > stuff is going to change soon anyway. > > Hmm. World transforms is a NT only thing so few applications > are likely to use it. Making all application pay the price > of that is probably not a good idea. Ture, few applications use it. However the cost is really quite low (which makes this whole thread rather academic). XDrawLine is probably one of the simplest of the X11 drawing primitives and even with this I doubt whether optimizing out a few mults is going to make any real difference in the total time. More complicated primitives (Arcs etc.) will spend even longer in the XServer so the gain is even less. I haven't even mentioned BitBlt and friends. > OK enough of sales talk. How do we do this? > My suggestion is that we simple chop up DC_FUNCS > in DC_FUNCS and DC_TRANSFORM_FUNCS where DC_TRANSFORM_FUNCS > contains functions that cares about the current state of > offset, scaling or more advanced world transforms. The > probably means most functions, but that doesn't really matter. > > Now we have make 4 DC_TRANSFORM_FUNCS tables one for each > of the cases above. In the begining the tables for all > the cases will be identical (OK (4) is not currently support, > but that doesn't change the overall idea). I knew you were going to suggest this ;) Huw.
Re: X11DRV_LineTo optimization ?
> Indeed. Balancing optimization vs maintainabillity > is very difficult. > > As I also said earlier "Premature optimization > is the root to all evil." > > However optimizing function like X11DRV_LineTo isn't > really that premature it isn't likely to change very much. Exactly. X11DRV_LineTo is an example where we can do nearly nothing to accelerate it, since 99.9% of its time is spent in Xlib, during communication with the X server and the X server drawing the line. Look for real hotspots in the overall implementation and real hotspots in the code first. Example for a real code hotspot is the bitmap depth conversion, for the implementation might be wine <- wineserver -> wine communication overhead. Ciao, Marcus
RE: X11DRV_LineTo optimization ?
> > Compiler capabillities aside it feel like its the > > wrong level of abstraction. Since MapMode is > > always constant (with the exception of SetMapMode > > of course) throughout a function call the map mode > > test need only be done once. But the compiler > > can't know whether a call to a inbetween function > > changes the map mode or not (which a human knows) > > so it must evaluate the expression again even given > > teoretically optimal use of common subexpression > > elimination (CSE). > > Yes. But I was rather trying to search for a solution > that affects maintainability of the code as little > as possible, while still offering as much performance > improvement as possible. OK. As I said in earlier I think it will work quite with the current code since the compiler will be able to do CSE in most cases. I mainly dislike it because it is IMHO on the wrong abstraction level. > Generally, the problem with performance optimizations > is that they tend to make the code less maintainable, > by introducing code duplication, special cases, even > things like inline assembly etc. At least in the > current state of Wine, I tend to think maintainability > more important than performance ... Indeed. Balancing optimization vs maintainabillity is very difficult. As I also said earlier "Premature optimization is the root to all evil." However optimizing function like X11DRV_LineTo isn't really that premature it isn't likely to change very much. > The solution that you're suggesting (checking the map > mode only once in a function), would lead to code like > > if ( !MM_TEXT ) > { > complex algorithm > } > else > { > copy of the same algorithm with minor changes > } > > which I feel is less maintainable, as every change > to the algorithm must now be done at least twice. True. My lastest proposal involes having different functions for the different cases (RAW, OFFSET, SCALE, LINEAR) and that is in some meaning even worse. However I only suggest doing this for functions that are fully implemented and are is not supposed to be changed very much like X11DRV_LineTo. But perhaps you are right that we should wait with these kind of optimizations until after Wine 1.0. > Of course, in the present situtation the 'algorithm' is > rather trivial, making this discussion academic, but it > would still IMHO be a step into the wrong direction. Well. How we handle these kind of issues are rather fundamental and whichever way we choose to do it will likely be used as optimization in almost every GDI fucntion that depends on map mode and friends.
Re: X11DRV_LineTo optimization ?
Patrik Stridvall wrote: > I guess you mean > > #define XDPTOLP(dc,x) \ > (((dc)->MapMode == MM_TEXT ? ((x) - (dc)->vportOrgX) : >MulDiv(((x)-(dc)->vportOrgX), (dc)->wndExtX, > (dc)->vportExtX)) + (dc)->wndOrgX) Right. I forgot that the viewport origin can be nonzero even in MM_TEXT mapping mode ... [snip] > Compiler capabillities aside it feel like its the > wrong level of abstraction. Since MapMode is > always constant (with the exception of SetMapMode > of course) throughout a function call the map mode > test need only be done once. But the compiler > can't know whether a call to a inbetween function > changes the map mode or not (which a human knows) > so it must evaluate the expression again even given > teoretically optimal use of common subexpression > elimination (CSE). Yes. But I was rather trying to search for a solution that affects maintainability of the code as little as possible, while still offering as much performance improvement as possible. Generally, the problem with performance optimizations is that they tend to make the code less maintainable, by introducing code duplication, special cases, even things like inline assembly etc. At least in the current state of Wine, I tend to think maintainability more important than performance ... The solution that you're suggesting (checking the map mode only once in a function), would lead to code like if ( !MM_TEXT ) { complex algorithm } else { copy of the same algorithm with minor changes } which I feel is less maintainable, as every change to the algorithm must now be done at least twice. Of course, in the present situtation the 'algorithm' is rather trivial, making this discussion academic, but it would still IMHO be a step into the wrong direction. Bye, Ulrich -- Dr. Ulrich Weigand [EMAIL PROTECTED]
RE: X11DRV_LineTo optimization ?
> We could implement separate DC_FUNCTIONS > (eg X11DRV_LineToText) for the MM_TEXT case and > switch the functions in the SetMapMode call. > This would eliminate _all_ the unnecessary > checks and multiplications. Yes. If you read my latest mail I proposed just that or rather something that does that in a more generalized way.
RE: X11DRV_LineTo optimization ?
On Sat, 14 Oct 2000, Patrik Stridvall wrote: [snip] > I guess you mean > > #define XDPTOLP(dc,x) \ > (((dc)->MapMode == MM_TEXT ? ((x) - (dc)->vportOrgX) : >MulDiv(((x)-(dc)->vportOrgX), (dc)->wndExtX, > (dc)->vportExtX)) + (dc)->wndOrgX) > > > This should generate the same code (the multiple tests > > should be caught be common subexpression elimination), > > is IMO a little cleaner, and will automatically help at > > every place this coordinate transform is used ;-) > > That was my initial idea. > > Eventhough you are correct that a clever compiler > should do that in X11DRV_LineTo there are cases > that are not easy (or even possible) to optimize > for the compiler. > > Compiler capabillities aside it feel like its the > wrong level of abstraction. Since MapMode is > always constant (with the exception of SetMapMode > of course) throughout a function call the map mode > test need only be done once. But the compiler > can't know whether a call to a inbetween function > changes the map mode or not (which a human knows) > so it must evaluate the expression again even given > teoretically optimal use of common subexpression > elimination (CSE). > > In short wrong level of abstraction IMHO. > > Note that doesn't mean that it will not be > a win for the current Wine use of XDPTOLP > which AFAICS seem very well suitable for CSE. > I don't know try and find out. We could implement separate DC_FUNCTIONS (eg X11DRV_LineToText) for the MM_TEXT case and switch the functions in the SetMapMode call. This would eliminate _all_ the unnecessary checks and multiplications. Norman
RE: X11DRV_LineTo optimization ?
> Please before we start optimizing can we at least make it work > correctly :) All of these XLPTODP all completely broken if we have a > world transform set up that involves anything less trivial than a > translate/scale. We're going to end up using something like LPtoDP > here instead. By all means optimize, all I'm saying is that this > stuff is going to change soon anyway. Hmm. World transforms is a NT only thing so few applications are likely to use it. Making all application pay the price of that is probably not a good idea. Perhaps we should make abstraction of how things work a little higher in order to optimize better without having to ha a lot of tests in the code. There are basicly 4 different kinds of applications needs. (1) No use either offset, scaling or more advanced world transforms (2) Use of offsets (3) Use of offsets and scaling (4) Use of offsets, scalings and more advanced world transforms I'm not sure whether it is meaningful to separate the (1) and (2) cases doing since calculating offsets is very cheap. Eventhough it costs a few extra memory read, having two separate implementation might be more costly for the cache because even speed critical applications are not unlikely to do (1) for some DC and (2) for others. Any way that is probaly not of much consequence speed wise either way. However separting (1,2) and (3) and (4) probably is. Applications doing (3) usually do that because of they want to shared code for printing and screen displaying. The are not usually speed critical eventhough fast drawing of printer previews doesn't hurt. Application doing (4) probably does it because the need it and are prepared to pay the performance price for it. OK enough of sales talk. How do we do this? My suggestion is that we simple chop up DC_FUNCS in DC_FUNCS and DC_TRANSFORM_FUNCS where DC_TRANSFORM_FUNCS contains functions that cares about the current state of offset, scaling or more advanced world transforms. The probably means most functions, but that doesn't really matter. Now we have make 4 DC_TRANSFORM_FUNCS tables one for each of the cases above. In the begining the tables for all the cases will be identical (OK (4) is not currently support, but that doesn't change the overall idea). When offset, scaling and friends because of calls to SetMapMode SetWindowEx, SetWorldTransform and friends we analyze the whether offset is 0 or scaling is 1 or whatever and assign the correct jumptable to dc->transform_funcs and the correct function will be called automatically until a transform changing function is called again. As I said in the begining the tables will the same. However when anybody wishes to optimize a function we simply changes which function the corresponding DC_TRANSFORM_FUNCS points to. So in the X11DRV_LineTo case we get: BOOL X11DRV_RAW_LineTo( DC *dc, INT x, INT y ) { X11DRV_PDEVICE *physDev = (X11DRV_PDEVICE *)dc->physDev; if (!X11DRV_SetupGCForPen( dc )) return TRUE; /* FIXME: ??? */ /* Update the pixmap from the DIB section */ X11DRV_DIB_UpdateDIBSection(dc, FALSE); TSXDrawLine(display, physDev->drawable, physDev->gc, dc->w.DCOrgX + dc->w.CursPosX, dc->w.DCOrgY + dc->w.CursPosY, dc->w.DCOrgX + x, dc->w.DCOrgY + y ); /* Update the DIBSection from the pixmap */ X11DRV_DIB_UpdateDIBSection(dc, TRUE); return TRUE; } BOOL X11DRV_OFFSET_LineTo( DC *dc, INT x, INT y ) { X11DRV_PDEVICE *physDev = (X11DRV_PDEVICE *)dc->physDev; if (!X11DRV_SetupGCForPen( dc )) return TRUE; /* FIXME: ??? */ /* Update the pixmap from the DIB section */ X11DRV_DIB_UpdateDIBSection(dc, FALSE); TSXDrawLine(display, physDev->drawable, physDev->gc, dc->w.DCOrgX + dc->w.CursPosX - dc->wndOrgX, dc->w.DCOrgY + dc->w.CursPosY - dc->wndOrgY, dc->w.DCOrgX + x - dc->wndOrgX, dc->w.DCOrgY + y - dc->wndOrgY ); /* Update the DIBSection from the pixmap */ X11DRV_DIB_UpdateDIBSection(dc, TRUE); return TRUE; } BOOL X11DRV_SCALING_LineTo( DC *dc, INT x, INT y ) { X11DRV_PDEVICE *physDev = (X11DRV_PDEVICE *)dc->physDev; if (!X11DRV_SetupGCForPen( dc )) return TRUE; /* FIXME: ??? */ /* Update the pixmap from the DIB section */ X11DRV_DIB_UpdateDIBSection(dc, FALSE); TSXDrawLine(display, physDev->drawable, physDev->gc, dc->w.DCOrgX + XLPTODP( dc, dc->w.CursPosX ), dc->w.DCOrgY + YLPTODP( dc, dc->w.CursPosY ), dc->w.DCOrgX + XLPTODP( dc, x ), dc->w.DCOrgY + YLPTODP( dc, y ) ); /* Update the DIBSection from the pixmap */ X11DRV_DIB_UpdateDIBSection(dc, TRUE); return TRUE; } BOOL X11DRV_LINEAR_LineTo( DC *dc, INT x, INT y ) { /* FIXME: not supported */ } Alternatively we could do BOOL TRANSFORM_NAME(X11DRV,LineTo)( DC *dc, INT x, INT y ) { X11DRV_PDEVICE *physDev = (X11DRV_P
Re: X11DRV_LineTo optimization ?
On Fri, Oct 13, 2000 at 04:56:50PM +0200, mark dufour wrote: > > the following code is now used here to draw a line: > > TSXDrawLine(display, physDev->drawable, physDev->gc, > dc->w.DCOrgX + XLPTODP( dc, dc->w.CursPosX ), > dc->w.DCOrgY + YLPTODP( dc, dc->w.CursPosY ), > dc->w.DCOrgX + XLPTODP( dc, x ), > dc->w.DCOrgY + YLPTODP( dc, y ) ); > > where > > #define XLPTODP(dc,x) \ > (MulDiv(((x)-(dc)->wndOrgX), (dc)->vportExtX, (dc)->wndExtX) + > (dc)->vportOrgX) > #define YLPTODP(dc,y) \ > (MulDiv(((y)-(dc)->wndOrgY), (dc)->vportExtY, (dc)->wndExtY) + > (dc)->vportOrgY) Please before we start optimizing can we at least make it work correctly :) All of these XLPTODP all completely broken if we have a world transform set up that involves anything less trivial than a translate/scale. We're going to end up using something like LPtoDP here instead. By all means optimize, all I'm saying is that this stuff is going to change soon anyway. Huw.
RE: X11DRV_LineTo optimization ?
> Patrik Stridvall wrote: > > > Yes. It like many applications probably > > uses MM_TEXT where it is always so that > > dc->vportExtX == 1 && dc->vportExtY == 1 && > > dc->wndExtX == 1 && dc->wndExtY == 1. > > > > That following patch would be much better. Only one test. > > Note that testing vportOrgX and vportOrgY buys nothings > > since subtraction in the common case with 0 is likely to > > be much faster than the cost of a branch caused by > > an extra if statement. > > > > I'm not sure if it should be applied though. > > "Premature optimization is the root to all evil." > > [snip patch] > > Well, if this is *really* an optimization that helps noticably > (which in the end can only be decided by doing actual > measurements), then I'd still prefer to implement it by > > #define XDPTOLP(dc,x) \ > (((dc)->MapMode == MM_TEXT ? (x) : > MulDiv(((x)-(dc)->vportOrgX), (dc)->wndExtX, > (dc)->vportExtX)) + (dc)->wndOrgX) > > (and so on for the other macros in gdi.h) I guess you mean #define XDPTOLP(dc,x) \ (((dc)->MapMode == MM_TEXT ? ((x) - (dc)->vportOrgX) : MulDiv(((x)-(dc)->vportOrgX), (dc)->wndExtX, (dc)->vportExtX)) + (dc)->wndOrgX) > This should generate the same code (the multiple tests > should be caught be common subexpression elimination), > is IMO a little cleaner, and will automatically help at > every place this coordinate transform is used ;-) That was my initial idea. Eventhough you are correct that a clever compiler should do that in X11DRV_LineTo there are cases that are not easy (or even possible) to optimize for the compiler. Compiler capabillities aside it feel like its the wrong level of abstraction. Since MapMode is always constant (with the exception of SetMapMode of course) throughout a function call the map mode test need only be done once. But the compiler can't know whether a call to a inbetween function changes the map mode or not (which a human knows) so it must evaluate the expression again even given teoretically optimal use of common subexpression elimination (CSE). In short wrong level of abstraction IMHO. Note that doesn't mean that it will not be a win for the current Wine use of XDPTOLP which AFAICS seem very well suitable for CSE. I don't know try and find out.
Re: X11DRV_LineTo optimization ?
Patrik Stridvall wrote: > Yes. It like many applications probably > uses MM_TEXT where it is always so that > dc->vportExtX == 1 && dc->vportExtY == 1 && > dc->wndExtX == 1 && dc->wndExtY == 1. > > That following patch would be much better. Only one test. > Note that testing vportOrgX and vportOrgY buys nothings > since subtraction in the common case with 0 is likely to > be much faster than the cost of a branch caused by > an extra if statement. > > I'm not sure if it should be applied though. > "Premature optimization is the root to all evil." [snip patch] Well, if this is *really* an optimization that helps noticably (which in the end can only be decided by doing actual measurements), then I'd still prefer to implement it by #define XDPTOLP(dc,x) \ (((dc)->MapMode == MM_TEXT ? (x) : MulDiv(((x)-(dc)->vportOrgX), (dc)->wndExtX, (dc)->vportExtX)) + (dc)->wndOrgX) (and so on for the other macros in gdi.h) This should generate the same code (the multiple tests should be caught be common subexpression elimination), is IMO a little cleaner, and will automatically help at every place this coordinate transform is used ;-) Bye, Ulrich -- Dr. Ulrich Weigand [EMAIL PROTECTED]
RE: X11DRV_LineTo optimization ?
> I'm not sure if > they're still as > slow as they used to be, but wouldn't the following code be more > efficient? ( especially with many short lines ) > > if( dc->vportExtX == 1 && dc->vportExtY == 1 && dc->wndExtX == 1 && > dc->wndExtY == 1 && dc->vportOrgX == 0 && dc->vportOrgY == 0 ) > TSXDrawLine(display, physDev->drawable, physDev->gc, > dc->w.DCOrgX + dc->w.CursPosX - dc->wndOrgX, > dc->w.DCOrgY + dc->w.CursPosY - dc->wndOrgY, > dc->w.DCOrgX + x - dc->wndOrgX, > dc->w.DCOrgY + y - dc->wndOrgY ); > else > TSXDrawLine(display, physDev->drawable, physDev->gc, > dc->w.DCOrgX + XLPTODP( dc, dc->w.CursPosX ), > dc->w.DCOrgY + YLPTODP( dc, dc->w.CursPosY ), > dc->w.DCOrgX + XLPTODP( dc, x ), > dc->w.DCOrgY + YLPTODP( dc, y ) ); > > mspaint, for example, almost never executes the else statement. Yes. It like many applications probably uses MM_TEXT where it is always so that dc->vportExtX == 1 && dc->vportExtY == 1 && dc->wndExtX == 1 && dc->wndExtY == 1. That following patch would be much better. Only one test. Note that testing vportOrgX and vportOrgY buys nothings since subtraction in the common case with 0 is likely to be much faster than the cost of a branch caused by an extra if statement. I'm not sure if it should be applied though. "Premature optimization is the root to all evil." Index: wine/graphics/x11drv/graphics.c === RCS file: /home/wine/wine/graphics/x11drv/graphics.c,v retrieving revision 1.31 diff -u -u -r1.31 graphics.c --- wine/graphics/x11drv/graphics.c 2000/08/19 21:38:55 1.31 +++ wine/graphics/x11drv/graphics.c 2000/10/13 20:07:50 @@ -294,17 +294,28 @@ { X11DRV_PDEVICE *physDev = (X11DRV_PDEVICE *)dc->physDev; -if (X11DRV_SetupGCForPen( dc )) { - /* Update the pixmap from the DIB section */ - X11DRV_DIB_UpdateDIBSection(dc, FALSE); +if (!X11DRV_SetupGCForPen( dc )) return TRUE; /* FIXME: ??? */ + +/* Update the pixmap from the DIB section */ +X11DRV_DIB_UpdateDIBSection(dc, FALSE); + +if (dc->w.MapMode != MM_TEXT) { TSXDrawLine(display, physDev->drawable, physDev->gc, dc->w.DCOrgX + XLPTODP( dc, dc->w.CursPosX ), dc->w.DCOrgY + YLPTODP( dc, dc->w.CursPosY ), dc->w.DCOrgX + XLPTODP( dc, x ), dc->w.DCOrgY + YLPTODP( dc, y ) ); - /* Update the DIBSection from the pixmap */ - X11DRV_DIB_UpdateDIBSection(dc, TRUE); +} else { + TSXDrawLine(display, physDev->drawable, physDev->gc, + dc->w.DCOrgX + dc->w.CursPosX - dc->wndOrgX, + dc->w.DCOrgY + dc->w.CursPosY - dc->wndOrgY, + dc->w.DCOrgX + x - dc->wndOrgX, + dc->w.DCOrgY + y - dc->wndOrgY ); } + +/* Update the DIBSection from the pixmap */ +X11DRV_DIB_UpdateDIBSection(dc, TRUE); + return TRUE; }