[Freerdp-devel] Update on SSE2 for RemoteFX

2011-06-15 Thread S. Erisman
I finished adding SSE2 optimizations for the Inverse DWT decoding routines this evening. Here are the current performance numbers from my Atom D510 test system: Without SSE: |---| PROFILER |

Re: [Freerdp-devel] RemoteFX Profiler: First results...

2011-06-14 Thread S. Erisman
On 6/14/2011 10:02 PM, Marc-André Moreau wrote: > Ah, so we either add Kernel.framework as a dependency on Mac OS X, or > we wrap a call to the cpuid instruction > > Any preference? > I have no preference. Are you going to make the change, or do you want me (or someone else) to work on it? > By

[Freerdp-devel] Update on SSE2 for RemoteFX

2011-06-14 Thread S. Erisman
I finished adding SSE2 optimizations for the Inverse DWT decoding routines this evening. Here are the current performance numbers from my Atom D510 test system: Without SSE: |---| PROFILER |

Re: [Freerdp-devel] RemoteFX Profiler: First results...

2011-06-14 Thread S. Erisman
Marc, On 6/14/2011 7:01 PM, Marc-André Moreau wrote: > Hi Steve, > > I noticed the addition of cpuid.h, which is not found on Mac OS X. Is > there a more portable alternative for detecting SSE support level? > Can't the cpuinfo instruction be used for this? That's weird. I was under the assump

Re: [Freerdp-devel] RemoteFX Profiler: First results...

2011-06-10 Thread S. Erisman
1:09 AM, S. Erisman wrote: > Hey Vic, > > On 6/10/2011 12:32 AM, Vic Lee wrote: >> Hi Steve, >> >> Yes both is faster, but the SSE version is still quite slower than the >> original one. Here is my testing. >> >> Before pulling: >> | rfx_dec

Re: [Freerdp-devel] RemoteFX Profiler: First results...

2011-06-10 Thread S. Erisman
On 6/10/2011 10:59 AM, S. Erisman wrote: The _mm_* function _do_ indeed get compiled down to SSE assembly instructions. For reference... Here is what the non-SSE code compiles down too: rfx_decode_YCbCr_to_RGB(): 0:55 push %ebp 1:31 d2

Re: [Freerdp-devel] RemoteFX Profiler: First results...

2011-06-10 Thread S. Erisman
Vic, On 6/10/2011 9:36 AM, Vic Lee wrote: > That's quite strange because it processes 8 coeffectients in parallel > and shouldn't be slower. > I agree. At this point I have no idea how it can still be slower, but it is. Granted this is my first time writing SSE code, and for all I know, I am d

Re: [Freerdp-devel] RemoteFX Profiler: First results...

2011-06-10 Thread S. Erisman
Vic, On 6/10/2011 4:16 AM, Martin Fleisz wrote: I am not quite sure how internally those _mm_* functions work, but if those are really functions, it will definitely hurt the performance. I think use assembly SSE2 instruction set directly (like paddw) should be much better. Vic The _mm_* funct

Re: [Freerdp-devel] RemoteFX Profiler: First results...

2011-06-09 Thread S. Erisman
Hey Vic, On 6/10/2011 12:32 AM, Vic Lee wrote: > Hi Steve, > > Yes both is faster, but the SSE version is still quite slower than the > original one. Here is my testing. > > Before pulling: > | rfx_decode_YCbCr_to_RGB_SSE2 | 2123 | 1.75 | 0.000824 | > | rfx_decode_YCbCr_to_RGB |

Re: [Freerdp-devel] RemoteFX Profiler: First results...

2011-06-09 Thread S. Erisman
On 6/9/2011 10:04 PM, S. Erisman wrote: > Vic, > > On 6/9/2011 10:05 PM, Vic Lee wrote: >> Hi Steve, >> >> The RemoteFX algorithm does not specify the minimum required bits, butt >> according to a forum post in MSDN, MS's implementation use 16bit signed >&g

Re: [Freerdp-devel] RemoteFX Profiler: First results...

2011-06-09 Thread S. Erisman
Vic, On 6/9/2011 10:05 PM, Vic Lee wrote: > Hi Steve, > > The RemoteFX algorithm does not specify the minimum required bits, butt > according to a forum post in MSDN, MS's implementation use 16bit signed > integer, so I believe it should be enough. > Thanks for the response. I actually found my

Re: [Freerdp-devel] RemoteFX Profiler: First results...

2011-06-09 Thread S. Erisman
Martin, On 6/9/2011 7:09 AM, Martin Fleisz wrote: One thing that will definitely hurt performance is if our memory is not 16-byte aligned. We should also have a possibility to overload the memory allocation in rfx_pool to use _mm_malloc/_mm_free to have correctly aligned buffers. We should a

Re: [Freerdp-devel] RemoteFX SSE/SSE2 decoding (was RemoteFX software decoding)

2011-06-07 Thread S. Erisman
Marc, I took your suggestions into account, revised my earlier patch, and committed my changes to a new fork: https://github.com/serisman/FreeRDP ... more comments below ... On 6/7/2011 9:29 PM, Marc-André Moreau wrote: > Hi Steve, > > Well, that was fast :) I had started thinking of the d

Re: [Freerdp-devel] RemoteFX SSE/SSE2 decoding (was RemoteFX software decoding)

2011-06-07 Thread S. Erisman
Marc, On 6/7/2011 11:35 PM, Marc-André Moreau wrote: > Hi Steve, > > I just tried your patch - awesome! > Thanks. That was the first SSE code I have ever written and it ended up being pretty easy. Once we have high level agreement on the structure needed around these optimizations there is def

Re: [Freerdp-devel] RemoteFX software decoding

2011-06-07 Thread S. Erisman
Vic, On 6/7/2011 7:18 PM, Vic Lee wrote: > Hi Steve, > > I think it looks like it might be not just affecting fullscreen > toggling only (depending on the window manager I guess it might happen > other cases). This patch should fix it more properly. > > diff --git a/X11/xf_decode.c b/X11/xf_deco

[Freerdp-devel] RemoteFX SSE/SSE2 decoding (was RemoteFX software decoding)

2011-06-07 Thread S. Erisman
Marc, On 6/6/2011 9:20 AM, Marc-André Moreau wrote: I read more about SSE, and then about NEON which is the equivalent for ARM My first impression is damn, how could I not see this before? This thing looks very well suited not only for acceleration of RemoteFX decoding, but there's a chance

Re: [Freerdp-devel] RemoteFX software decoding

2011-06-07 Thread S. Erisman
Marc, I vote to merge your github fork. I tried it out last night, and it seems pretty stable. Could you also include a fix for the fullscreen toggle (while using RemoteFX) issue that I sent to the list the other day? It should be as simple as clearing or resetting the clip region at the b

Re: [Freerdp-devel] RemoteFX software decoding

2011-06-05 Thread S. Erisman
On 6/5/2011 9:50 PM, Otavio Salvador wrote: > On Mon, Jun 6, 2011 at 02:25, S. Erisman wrote: >> I tried out your RemoteFX code over the weekend, and it works very nicely. > It didn't work for me. How did you configure the Windows Server to use > it? What worries me is that t

Re: [Freerdp-devel] RemoteFX software decoding

2011-06-05 Thread S. Erisman
On 5/25/2011 8:42 PM, Vic Lee wrote: > Hi, > > I have finally completed RemoteFX software decoding feature. It's writen > as a separate and relatively independent library librfx. I only added it > to xfreerdp, but the library is portable, so there shouldn't be problem > to use it in other UI. > > I

Re: [Freerdp-devel] Google Summer of Code 2011

2011-03-02 Thread S. Erisman
On 3/2/2011 8:13 AM, Marc-André Moreau wrote: > > By regular hardware, do you mean hardware that does not include the > special RemoteFX chip? Adding RemoteFX support without the chip means > implementing the codec in software, and that is way too much for a > student to do in a summer. I have l