Re: [Mono-dev] Mono generates inefficient vectorized code

2010-04-15 Thread Sergei Dyshel
Hello Rodrigo, I'm glad to hear about your improvements. Can you share the code? Of course I will share my code, but I need to do it through my IP department. I believe this is not the best approach. Mono.Simd was never intended to be a variable width simd API. Making such proposition coding

Re: [Mono-dev] Mono generates inefficient vectorized code

2010-04-13 Thread Sergei Dyshel
Hello Rodrigo, Regarding your question unfortunately I cannot apply for GSoC due to time and other constraints. With your tips I managed to extend linear scan on to vector registers and now SIMD code runs much faster. Thank you! My next (:]) question is about scalarization, i.e. running programs

Re: [Mono-dev] Mono generates inefficient vectorized code

2010-04-13 Thread Rodrigo Kumpera
Hi Sergei, I'm glad to hear about your improvements. Can you share the code? I believe this is not the best approach. Mono.Simd was never intended to be a variable width simd API. Making such proposition makes coding over it significantly harder. My suggestion is to implement both scalar

Re: [Mono-dev] Mono generates inefficient vectorized code

2010-04-13 Thread Jerry Maine - KF5ADY
Please do, Sergei I am also very much interested in the code. Rodrigo Kumpera wrote: Hi Sergei, I'm glad to hear about your improvements. Can you share the code? I believe this is not the best approach. Mono.Simd was never intended to be a variable width simd API. Making such proposition

Re: [Mono-dev] Mono generates inefficient vectorized code

2010-04-08 Thread Sergei Dyshel
Hello Rodrigo, Just picking up this conversation we had some time ago. I was asking why JIT does unneeded loads and stores and you answered that this behavior is because of lack of global reg allocator. I understand it so that any vreg which is used in different basic blocks is promoted to memory

Re: [Mono-dev] Mono generates inefficient vectorized code

2010-04-08 Thread Rodrigo Kumpera
Hi Sergei, On Thu, Apr 8, 2010 at 11:59 AM, Sergei Dyshel qyron.priv...@gmail.comwrote: Hello Rodrigo, Just picking up this conversation we had some time ago. I was asking why JIT does unneeded loads and stores and you answered that this behavior is because of lack of global reg allocator. I

Re: [Mono-dev] Mono generates inefficient vectorized code

2010-03-12 Thread Sergei Dyshel
Hi, Thanks for you answers, these are very good news! Last time I was checking Mono-LLVM, somewhere in December 2009, it didn't handle SIMD. The problem with Mono-LLVM is that it's implemented only for x86 and ARM targets and I basing my research mostly on PowerPC (with Altivec). By trying to

Re: [Mono-dev] Mono generates inefficient vectorized code

2010-03-11 Thread Rodrigo Kumpera
Hi Sergei, On Thu, Mar 11, 2010 at 8:30 PM, Sergei Dyshel qyron.priv...@gmail.comwrote: Hello, I'm doing some research on vectorization using Mono. I've noticed that code generated by Mono's JIT contains many unnecessary memory loads and stores. Here is simple example, the full code is

Re: [Mono-dev] Mono generates inefficient vectorized code

2010-03-11 Thread Sergei Dyshel
Hello Rodrigo, Thanks for the quick answer! But do you mean by it that the only problem is in lack of global register allocator? What if 'temp' was not vector but some bare 'int' temporary, would it be loaded and stored in each iteration? Another question. I know that there is also LLVM engine in

Re: [Mono-dev] Mono generates inefficient vectorized code

2010-03-11 Thread Jerry Maine - KF5ADY
And what needs to be done to get the global allocator to work? Rodrigo Kumpera wrote: Hi Sergei, On Thu, Mar 11, 2010 at 8:30 PM, Sergei Dyshel qyron.priv...@gmail.com mailto:qyron.priv...@gmail.com wrote: Hello, I'm doing some research on vectorization using Mono. I've noticed that

Re: [Mono-dev] Mono generates inefficient vectorized code

2010-03-11 Thread Rodrigo Kumpera
On Thu, Mar 11, 2010 at 9:15 PM, Sergei Dyshel qyron.priv...@gmail.comwrote: Hello Rodrigo, Thanks for the quick answer! But do you mean by it that the only problem is in lack of global register allocator? What if 'temp' was not vector but some bare 'int' temporary, would it be loaded and

Re: [Mono-dev] Mono generates inefficient vectorized code

2010-03-11 Thread Rodrigo Kumpera
Finish, test and optimize it. And port to 32bits systems. It it a huge amount of work. On Thu, Mar 11, 2010 at 9:29 PM, Jerry Maine - KF5ADY crashfou...@gmail.com wrote: And what needs to be done to get the global allocator to work? Rodrigo Kumpera wrote: Hi Sergei, On Thu, Mar 11,

Re: [Mono-dev] Mono generates inefficient vectorized code

2010-03-11 Thread Rodrigo Kumpera
On Thu, Mar 11, 2010 at 9:15 PM, Sergei Dyshel qyron.priv...@gmail.comwrote: Hello Rodrigo, Thanks for the quick answer! But do you mean by it that the only problem is in lack of global register allocator? What if 'temp' was not vector but some bare 'int' temporary, would it be loaded and

Re: [Mono-dev] Mono generates inefficient vectorized code

2010-03-11 Thread Zoltan Varga
Hi, After some fixes to the llvm code in mono SVN, it now generates the following: d: 0f 10 0fmovups (%rdi),%xmm1 10: 66 0f fe c1 paddd %xmm1,%xmm0 14: 48 83 c7 10 add$0x10,%rdi 18: 89 f1 mov%esi,%ecx 1a: