Re: [webkit-dev] Parallel JavaScript: Why a separate ParallelArray types

Filip Pizlo Fri, 12 Apr 2013 11:54:56 -0700

On Apr 12, 2013, at 8:36 AM, "Hudson, Rick" <rick.hud...@intel.com> wrote:


> I'm assuming that we agree that JavaScript must evolve to leverage the 
> hardware's power stingy parallelism.

True, but there is also the question of how high of a priority we should give 
to this relative to other project goals, and also what opportunity costs are 
introduced from any particular implementation of parallelism that we choose.  
What you are proposing will only work for code that performs non-effectful 
operations.  This is a limited form of parallelism, and I'm not sure that it's 
a general enough programming model to warrant adding new things to the platform.

> For completeness there seems to be the following approaches. 
> 
> 1)    Add nothing to the language and rely on increasingly sophisticated 
> compilers to detect opportunities to safely parallelize computations within 
> the current semantics. 
> 2)    Add methods to Array that provide parallel semantics. 
> 3)    Add syntax and keywords such as parallelFor to introduce parallelism.
> 4)    Add a parallel type (ParallelArray) and associated methods that provide 
> parallel semantics.

You forgot the obvious ones:

5) Message passing
6) Shared mutable memory with either threads and locks, or something along 
those lines.

We sort of already have (5) but it is half baked; I think there is opportunity 
to improve it.

We don't have (6).  (6) subsumes all of (1), (2), (3), (4), and (5) as it would 
allow the programmer to roll all of those primitives in JS.

We should not reject shared mutable memory out of hand.  It is the prevailing 
technique for implementing concurrency and parallelism in the other languages 
that people use.

> 
> The _nothing_ approach would have to deal with array methods, such as map and 
> reduce, that have sequential semantics. For example reduce is really reduce 
> right (or fold right) and these semantics defeat parallelization. For example 
> the EIC signature for the function/closure passed to reduce assumes E === 
> C[I] but the value E could be the result of previous invocation of the 
> function.  This approach seems to be strictly worse than the methods 
> approach. A similar nothing approach is to analyze loops to mechanically 
> discover opportunities for parallelization. While there exists decades of 
> literature that have been full of optimism the reality is that this approach 
> has not achieved wide spread success. 

Part of that lack of success stems from the fact that most code that does 
things to arrays has subtle loop-carried side-effects.  The closures passed to 
ParallelArray methods could have those, also; and if they did then you wouldn't 
be able to soundly parallelize.

You haven't convinced me that ParallelArray will be any more successful than 
the full autoparallelization approach.  You also haven't convinced me that it 
will be any more successful than other eclectic ideas, like OpenMP, for 
example.  I don't think that such annotation-based approaches have been 
particularly successful in other languages.  The languages that have seen the 
most success in parallel development are precisely those that give *complete* 
control to the programmer, such as Java, which gives you threads and locks 
combined with a sensible memory model - and allows you to craft your algorithm 
in the way that best suits your problem domain.  Why is this approach being 
rejected?

> 
> The _methods_ approach side steps the semantics issue raised in the nothing 
> approach but does not compose well with the original methods. For example 
> Array has several inherently sequential  methods, such as push and pop, that 
> do not have a reasonable parallel semantics since they force mutation. 
> That said this approach has its proponents. Nico 
> http://smallcultfollowing.com/babysteps/blog/2013/02/26/splitting-the-pjs-api/
>  suggested that we add unorderedMap and unorderedReduce to indicate that the 
> scheduling of the computation be unordered. While the idea is compelling on 
> the surface it is likely to lead to the development of algorithms that are at 
> first sequential and then hacked to make them parallel. This approach will 
> impede the goal of having programmers develop parallel algorithms and the 
> appropriate parallel data structures. The methods approach would miss a real 
> opportunity here to move programmers away from the sequential models and on 
> to parallel models.

ParallelArray also misses this opportunity.  The only thing that ParallelArray 
does is adds methods that could have been added to Array, and requires you to 
say, when you write your code, if you want parallelization or not.  The only 
problem this solves is inferring the "is it profitable to parallelize" 
question.  I'm not sure that inferring the answer to this question is hard 
enough to warrant a new type.

> 
> The _syntax_ approach suggests that we follow the lead of Amp and add 
> parallelFor and siblings to JavaScript. This approach seems to suffer from 
> not being backwardly compatible and like the methods approach misses an 
> opportunity to move programmers away from their sequential mindset.
> 
> The _parallel type_ approach provides a clear separation from the semantics 
> of Array and in doing so does a much better job of encouraging programmers to 
> build parallel algorithms and data structures instead of attempting to 
> parallelize sequential algorithms.

The methods approach also provides such a clear separation.

Just adding the word "Parallel" to the name of a new type and saying that this 
is the only type that benefits from parallelism does not constitute positive 
encouragement of any kind.  Quite the opposite, it seems the only thing it 
encourages is for VM engineers to give up on the inference approach entirely.

> The contract between the programmer and the system can be better defined if 
> ParallelArray exists independently of Array and the two prototypes could 
> develop independently. That is not to say that we shouldn't follow the 
> principle of least surprise and leverage the programmer's familiarity with 
> the Array methods signatures. For example the kernel/callback function should 
> follow the same conventions found in the Array.map signature such as the call 
> back (kernel) function formals being element, index, array.

But this is a double-edged sword: having a separate array type with separate 
semantics incurs opportunity costs for optimization.  As the two prototypes 
diverge, there will be optimizations that get implemented less well in one than 
in the other.

Supporting more types in a compiler is never free.

> 
> Furthermore the parallel type approach is a clear indication to the SDE and 
> JIT implementers that the programmer wishes to execute this code in parallel. 
> The SDE will be able to assist the programmer when they don't follow the 
> contract, for example by throwing when it encounters a kernel with side 
> effects.

But how will developers debug this exception?  Likely by using something like 
the Inspector's exception view which highlights parts of the code that threw.

We can do the same thing for autoparallelization in the "methods" approach: we 
can highlight those array method calls that were parallelized, as well as those 
that didn't.

I don't buy your argument that ParallelArray handles this developer scenario 
any better than the other approaches.  Whether the failure case throws an 
exception, or not, either way we can highlight the fact that one kernel or 
another wasn't parallelized.

> While the parallel type approach could be implemented on top of the methods 
> or even the syntax approach information about the intent of the programmer 
> would be lost to the JIT and the SDE. In summary the real value in adding 
> ParallelArray as its own type is that it helps move the developer structure 
> his programs and choose his algorithms with parallelism in mind and convey 
> that intent clearly. 

This argument can be used in favor of any parallel or concurrent programming 
model.  If I run s/ParallelArray as its own type/threads and locks/ on your 
statement, I get a convincing argument for the traditional approach to shared 
memory parallelism.

I do believe that parallelism, and concurrency, are features that should be 
added to JavaScript.  But I think we should aim higher than adding a 
limited-use restricted programming model for this purpose.

-Filip

_______________________________________________
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev

Re: [webkit-dev] Parallel JavaScript: Why a separate ParallelArray types

Reply via email to