Re: [webkit-dev] SIMD support in JavaScript

2014-09-29 Thread Dan Gohman
Hi Nadav,

- Original Message -
 Hi Dan!
 
  On Sep 28, 2014, at 6:44 AM, Dan Gohman sunf...@mozilla.com wrote:
  
  Hi Nadav,
  
  I agree with much of your assessment of the the proposed SIMD.js API.
  However, I don't believe it's unsuitability for some problems
  invalidates it for solving other very important problems, which it is
  well suited for. Performance portability is actually one of SIMD.js'
  biggest strengths: it's not the kind of performance portability that
  aims for a consistent percentage of peak on every machine (which, as you
  note, of course an explicit 128-bit SIMD API won't achieve), it's the
  kind of performance portability that achieves predictable performance
  and minimizes surprises across machines (though yes, there are some
  unavoidable ones, but overall the picture is quite good).
 
 There is a tradeoff between the performance portability of the SIMD.js ISA
 and its usefulness. A small number of instructions (that only targets 32bit
 data types, no masks, etc) is not useful for developing non-trivial vector
 programs. You need 16bit vector elements to support WebGL vertex indices,
 and lane-masking for implementing predicated control flow for programs like
 ray tracers. Introducing a large number of vector instructions will expose
 the performance portability problems. I don’t believe that there is a sweet
 spot in this tradeoff. I don’t think that we can find a small set of
 instructions that will be useful for writing non-trivial vector code that is
 performance portable.

My belief in the existence of a sweet spot is based on looking at other 
systems, hardware and software, that have already gone there.

For an interesting example, take a look at this page:

https://software.intel.com/en-us/articles/interactive-ray-tracing

Every SIMD operation used in that article is directly supported by a 
corresponding function in SIMD.js today. We do have an open question on whether 
we should do something different for the rsqrt instruction, since the hardware 
only provides an approximation. In this case the code requires some 
Newton-Raphson, which may give us some flexibility, but several things are 
possible there. And of course, sweet spot doesn't mean cure-all.

Also, I am preparing to propose that SIMD.js handle 16-bit vector elements too 
(int16x8). It fits pretty naturally into the overall model. There are some 
challenges on some architectures, but there are challenges with alternative 
approaches too, and overall the story looks good.

Other changes are also being discussed too. In general, the SIMD.js spec is 
still evolving; participation is welcome :-).

  This is an example of a weakness of depending on automatic vectorization
  alone. High-level language features create complications which can lead
  to surprising performance problems. Compiler transformations to target
  specialized hardware features often have widely varying applicability.
  Expensive analyses can sometimes enable more and better vectorization,
  but when a compiler has to do an expensive complex analysis in order to
  optimize, it's unlikely that a programmer can count on other compilers
  doing the exact same analysis and optimizing in all the same cases. This
  is a problem we already face in many areas of compilers, but it's more
  pronounced with vectorization than many other optimizations.
 
 I agree with this argument. Compiler optimizations are unpredictable. You
 never know when the register allocator will decide to spill a variable
 inside a hot loop.  or a memory operation confuse the alias analysis. I also
 agree that loop vectorization is especially sensitive.
 However, it looks like the kind of vectorization that is needed to replace
 SIMD.js is a very simple SLP vectorization
 http://llvm.org/docs/Vectorizers.html#the-slp-vectorizer (BB
 vectorization). It is really easy for a compiler to combine a few scalar
 arithmetic operations into a vector. LLVM’s SLP-vectorizer support
 vectorization of computations across basic blocks and succeeds in surprising
 places, like vectorization of STDLIB code where the ‘begin' and ‘end'
 iterators fit into a 128-bit register!

That's a surprising trick!

I agree that SLP vectorization doesn't have the same level of performance 
cliff as loop vectorization. And, it may be a desirable thing for JS JITs to 
start doing.

Even so, there is still value in an explicit SIMD API in the present. For the 
core features, instead of giving developers sets of expression patterns to 
follow to ensure SLP recognition, we are giving names to those patterns and 
letting developers identify which patterns they wish to use by their names. We 
can coordinate, compare, and standardize them by name across browsers, and in 
the future we may make a variety of interesting extensions to the API which 
developers will be able to feature-test for.

And if, in the future, SLP vectorization proves itself reliable enough in JS, 
then we can drop our custom JIT 

Re: [webkit-dev] SIMD support in JavaScript

2014-09-29 Thread Dan Gohman
Hi Maciej,

- Original Message -
 
 Dan, you say that SIMD.js delivers performance portability, and Nadav says it
 doesn’t.
 
 Nadav’s argument seems to come down to (as I understand it):
 - The set of vector operations supported on different CPU architectures
 varies widely.

This is true, but it's also true that there is a core set of features which is 
pretty consistent across popular SIMD architectures. This commonality exists 
because it's a very popular set. The proposed SIMD.js doesn't solve all 
problems, but it does solve a large number of important problems well, and it 
is following numerous precedents.

We are also exploring the possibility of exposing additional instructions 
outside this core set. Several creative ideas are being discussed which could 
expand the API's reach while preserving a portability story. However, 
regardless of what we do there, I expect the core set will remain a prominent 
part of the API, due to its applicability.

 - Executing vector intrinsics on processors that don’t support them is
 slower than executing multiple scalar instructions because the compiler
 can’t always generate efficient with the same semantics.”

This is also true, however the intent of SIMD.js *is* to be implementable on 
all popular architectures. The SIMD.js spec is originally derived from the Dart 
SIMD spec, which is already implemented and in use on at least x86 and ARM. We 
are also taking some ideas from OpenCL, which offers a very similar set of core 
functionality, and which is implemented on even more architectures. We have 
several reasons to expect that SIMD.js can cover enough functionality to be 
useful while still being sufficiently portable.

 - Even when vector intrinsics are supported by the CPU, whether it is
 profitable to use them may depend in non-obvious ways on exact
 characteristics of the target CPU and the surrounding code (the Port5
 example).

With SIMD.js, there are plain integer types, so developers directly bypass 
plain JS number semantics, so there are fewer corner cases for the compiler to 
insert extra code to check for. This means fewer branches, and among other 
things, should mean less port 5 contention overall on Sandy Bridge.

Furthermore, automatic vectorization often requires the compiler make 
conservative assumptions about key information like pointer aliasing, trip 
counts, integer overflow, array indexing, load safety, scatter ordering, 
alignment, and more. In order to preserve observable semantics, these 
assumptions cause compilers to insert extra instructions, which are typically 
things like selects, shuffles, branches or other things, to handle all the 
possible corner cases. This is extra overhead that human programmers can often 
avoid, because they can more easily determine what corner cases are relevant in 
a given piece of code. And on Sandy Bridge in particular, these extra selects, 
shuffles, and branches hit port 5.

 For these reasons, Nadav says that it’s better to autovectorize, and that
 this is the norm even for languages with explicit vector data. In other
 words, he’s saying that SIMD.js will result in code that is not
 performance-portable between different CPUs.

I question whether it is actually the norm. In C++, where auto-vectorization is 
available in every major compiler today, explicit SIMD APIs like xmmintrin.h 
are hugely popular. That particular header has become supported by Microsoft's 
C++ compiler, Intel's C++ compiler, GCC, and clang. I see many uses of 
xmmintrin.h in many contexts, including HPC, graphics, codecs, cryptography, 
and games. It seems many C++ developers are still willing to go through the 
pain of #ifdefs, preprocessor macros, and funny-looking syntax rather than rely 
on auto-vectorization, even with restrict and and other aids.

Both auto-vectorization and SIMD.js have their strengths, and both have their 
weaknesses. I don't believe the fact that both solve some problems that the 
other doesn't rules out either of them.

 I don’t see a rebuttal to any of these points. Instead, you argue that,
 because SIMD.js does not require advanced compiler analysis, it is more
 likely to give similar results between different JITs (presumably when
 targeting the same CPU, or ones with the same supported vector operations
 and similar perf characteristics). That seems like a totally different sense
 of performance portability.

 Given these arguments, it’s possible that you and Nadav are both right[*].
 That would mean that both these statements hold:
 (a) SIMD.js is not performance-portable between different CPU architectures
 and models.
 (b) SIMD.js is performance-portable between different JITs targeting the same
 CPU model.
 
 On net, I think that combination would be a strong argument *against*
 SIMD.js. The Web aims for portability between different hardware and not
 just different software. At Apple alone we support four major CPU
 instruction sets and a considerably greater number of 

[webkit-dev] Adding support for gradient midpoint

2014-09-29 Thread Rik Cabanier
All,

I'm planning on adding support for gradient midpoints.[1]
Since this is such a small addition, the feature will not be behind a
feature flag and will be enabled by default.

Let me know if you have questions or concerns with this approach

1: http://dev.w3.org/csswg/css-images-4/#color-interpolation-hint
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] Adding support for gradient midpoint

2014-09-29 Thread Benjamin Poulain

On 9/29/14, 1:41 PM, Rik Cabanier wrote:

I'm planning on adding support for gradient midpoints.[1]
Since this is such a small addition, the feature will not be behind a
feature flag and will be enabled by default.

Let me know if you have questions or concerns with this approach

1: http://dev.w3.org/csswg/css-images-4/#color-interpolation-hint


Sounds okay to skip the feature flag for this.

As usual:
-Please make sure to add a functional implementation in a single patch.
-Please create outstanding test coverage.

What is the bug number for tracking this?

Benjamin
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


[webkit-dev] New EWS status bubbles in Bugzilla

2014-09-29 Thread Alexey Proskuryakov
Hi,

WebKit Bugzilla has new EWS status bubbles now, which will hopefully make it 
more clear what's going on with a patch. Mysterious yellow bubbles that could 
mean anything were eliminated, and most importantly, there is now detailed 
information presented on hover:



Please try it out, and let me know if something breaks, or is not as good as it 
could be!

- Alexey

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev