Re: [webkit-dev] Parallel JavaScript: Why a separate ParallelArray types

Filip Pizlo Fri, 12 Apr 2013 13:51:42 -0700

On Apr 12, 2013, at 1:39 PM, Jarred Nicholls <[email protected]> wrote:


> On Fri, Apr 12, 2013 at 2:54 PM, Filip Pizlo <[email protected]> wrote:
> 
> On Apr 12, 2013, at 8:36 AM, "Hudson, Rick" <[email protected]> wrote:
> 
> > I'm assuming that we agree that JavaScript must evolve to leverage the 
> > hardware's power stingy parallelism.
> 
> True, but there is also the question of how high of a priority we should give 
> to this relative to other project goals, and also what opportunity costs are 
> introduced from any particular implementation of parallelism that we choose.  
> What you are proposing will only work for code that performs non-effectful 
> operations.  This is a limited form of parallelism, and I'm not sure that 
> it's a general enough programming model to warrant adding new things to the 
> platform.
> 
> > For completeness there seems to be the following approaches.
> >
> > 1)    Add nothing to the language and rely on increasingly sophisticated 
> > compilers to detect opportunities to safely parallelize computations within 
> > the current semantics.
> > 2)    Add methods to Array that provide parallel semantics.
> > 3)    Add syntax and keywords such as parallelFor to introduce parallelism.
> > 4)    Add a parallel type (ParallelArray) and associated methods that 
> > provide parallel semantics.
> 
> You forgot the obvious ones:
> 
> 5) Message passing
> 6) Shared mutable memory with either threads and locks, or something along 
> those lines.
> 
> We sort of already have (5) but it is half baked; I think there is 
> opportunity to improve it.
> 
> We don't have (6).  (6) subsumes all of (1), (2), (3), (4), and (5) as it 
> would allow the programmer to roll all of those primitives in JS.
> 
> We should not reject shared mutable memory out of hand.  It is the prevailing 
> technique for implementing concurrency and parallelism in the other languages 
> that people use.
> 
> >
> > The _nothing_ approach would have to deal with array methods, such as map 
> > and reduce, that have sequential semantics. For example reduce is really 
> > reduce right (or fold right) and these semantics defeat parallelization. 
> > For example the EIC signature for the function/closure passed to reduce 
> > assumes E === C[I] but the value E could be the result of previous 
> > invocation of the function.  This approach seems to be strictly worse than 
> > the methods approach. A similar nothing approach is to analyze loops to 
> > mechanically discover opportunities for parallelization. While there exists 
> > decades of literature that have been full of optimism the reality is that 
> > this approach has not achieved wide spread success.
> 
> Part of that lack of success stems from the fact that most code that does 
> things to arrays has subtle loop-carried side-effects.  The closures passed 
> to ParallelArray methods could have those, also; and if they did then you 
> wouldn't be able to soundly parallelize.
> 
> You haven't convinced me that ParallelArray will be any more successful than 
> the full autoparallelization approach.  You also haven't convinced me that it 
> will be any more successful than other eclectic ideas, like OpenMP, for 
> example.  I don't think that such annotation-based approaches have been 
> particularly successful in other languages.  The languages that have seen the 
> most success in parallel development are precisely those that give *complete* 
> control to the programmer, such as Java, which gives you threads and locks 
> combined with a sensible memory model - and allows you to craft your 
> algorithm in the way that best suits your problem domain.  Why is this 
> approach being rejected?
> 
> >
> > The _methods_ approach side steps the semantics issue raised in the nothing 
> > approach but does not compose well with the original methods. For example 
> > Array has several inherently sequential  methods, such as push and pop, 
> > that do not have a reasonable parallel semantics since they force mutation.
> > That said this approach has its proponents. Nico 
> > http://smallcultfollowing.com/babysteps/blog/2013/02/26/splitting-the-pjs-api/
> >  suggested that we add unorderedMap and unorderedReduce to indicate that 
> > the scheduling of the computation be unordered. While the idea is 
> > compelling on the surface it is likely to lead to the development of 
> > algorithms that are at first sequential and then hacked to make them 
> > parallel. This approach will impede the goal of having programmers develop 
> > parallel algorithms and the appropriate parallel data structures. The 
> > methods approach would miss a real opportunity here to move programmers 
> > away from the sequential models and on to parallel models.
> 
> ParallelArray also misses this opportunity.  The only thing that 
> ParallelArray does is adds methods that could have been added to Array, and 
> requires you to say, when you write your code, if you want parallelization or 
> not.  The only problem this solves is inferring the "is it profitable to 
> parallelize" question.  I'm not sure that inferring the answer to this 
> question is hard enough to warrant a new type.
> 
> >
> > The _syntax_ approach suggests that we follow the lead of Amp and add 
> > parallelFor and siblings to JavaScript. This approach seems to suffer from 
> > not being backwardly compatible and like the methods approach misses an 
> > opportunity to move programmers away from their sequential mindset.
> >
> > The _parallel type_ approach provides a clear separation from the semantics 
> > of Array and in doing so does a much better job of encouraging programmers 
> > to build parallel algorithms and data structures instead of attempting to 
> > parallelize sequential algorithms.
> 
> The methods approach also provides such a clear separation.
> 
> Just adding the word "Parallel" to the name of a new type and saying that 
> this is the only type that benefits from parallelism does not constitute 
> positive encouragement of any kind.  Quite the opposite, it seems the only 
> thing it encourages is for VM engineers to give up on the inference approach 
> entirely.
> 
> > The contract between the programmer and the system can be better defined if 
> > ParallelArray exists independently of Array and the two prototypes could 
> > develop independently. That is not to say that we shouldn't follow the 
> > principle of least surprise and leverage the programmer's familiarity with 
> > the Array methods signatures. For example the kernel/callback function 
> > should follow the same conventions found in the Array.map signature such as 
> > the call back (kernel) function formals being element, index, array.
> 
> But this is a double-edged sword: having a separate array type with separate 
> semantics incurs opportunity costs for optimization.  As the two prototypes 
> diverge, there will be optimizations that get implemented less well in one 
> than in the other.
> 
> Supporting more types in a compiler is never free.
> 
> >
> > Furthermore the parallel type approach is a clear indication to the SDE and 
> > JIT implementers that the programmer wishes to execute this code in 
> > parallel. The SDE will be able to assist the programmer when they don't 
> > follow the contract, for example by throwing when it encounters a kernel 
> > with side effects.
> 
> But how will developers debug this exception?  Likely by using something like 
> the Inspector's exception view which highlights parts of the code that threw.
> 
> We can do the same thing for autoparallelization in the "methods" approach: 
> we can highlight those array method calls that were parallelized, as well as 
> those that didn't.
> 
> I don't buy your argument that ParallelArray handles this developer scenario 
> any better than the other approaches.  Whether the failure case throws an 
> exception, or not, either way we can highlight the fact that one kernel or 
> another wasn't parallelized.
> 
> > While the parallel type approach could be implemented on top of the methods 
> > or even the syntax approach information about the intent of the programmer 
> > would be lost to the JIT and the SDE. In summary the real value in adding 
> > ParallelArray as its own type is that it helps move the developer structure 
> > his programs and choose his algorithms with parallelism in mind and convey 
> > that intent clearly.
> 
> This argument can be used in favor of any parallel or concurrent programming 
> model.  If I run s/ParallelArray as its own type/threads and locks/ on your 
> statement, I get a convincing argument for the traditional approach to shared 
> memory parallelism.
> 
> I do believe that parallelism, and concurrency, are features that should be 
> added to JavaScript.  But I think we should aim higher than adding a 
> limited-use restricted programming model for this purpose.
> 
> -Filip
> 
> TL;DR: I'd rather see ParallelArray be a library that is built on top of 
> lower-level primitives.

I'm curious: would you want to use ParallelArray, if you had the flexibility of 
building a different abstraction?

I find ParallelArray to be an awkward abstraction to begin with.  I like work 
queues and such.  The whole idea that the only parallelism you get is from 
arrays feels narrow-minded.

> 
> For as little worth as it is, I agree with you Filip that providing low-level 
> primitives would be best in terms of a foundation for many parallel 
> programming models.  In terms of actually supporting shared mutable memory 
> and locks, it would be a challenge for the engines to become thread-safe 
> (internally) across contexts that share memory cells.  I'd envision the 
> sharing of mutable memory being an opt-in semantic, marking a piece of memory 
> as being shared/mutable from multiple contexts.

Fixing the engines is a matter of typing. ;-)

I don't think we need to add language features for opt-in, though this is an 
intriguing idea.

Without opt-in, the one biggish challenge would be DOM accesses from threads 
other than the main thread; I suspect for those the initial implementation 
would have to throw an exception from non-main-threads if you try to access a 
DOM node.  This is akin to what some UI toolkits do: they let you have threads 
but prohibit access UI things from anything but the thread on which the runloop 
sits.  Of course, they don't do the thread-check; we would have to do it to 
preserve integrity and security.

But those are details.  My basic position: I don't think we should fall into 
the trap of thinking that we should dumb down the language to the level of 
dumbness of the engine.  This is short-sighted.  The engine should get smarter, 
and should strive to benefit users in the broadest way possible.  The terrible 
downside of the doing the stopgap solutions is that you then have to maintain 
them; even when they then get superseded and deprecated they still rot in the 
code base.  I don't think that the need for numerical-only parallelism on the 
web is so pressing that we should fall into this trap.  We can do this the 
right way!

> 
> The fact is, parallelism comes in many different forms and use cases.  
> Performing side-effect homogenous tasks over an Array of data is one type of 
> problem (and which can be GPU accelerated easily).  Performing separate 
> heterogenous operations on shared (or unshared) memory can be different more 
> general problem altogether.  I'd tend to prefer a solution that provides the 
> primitives to solve all such cases, and can be further abstracted & 
> simplified by user-land code to implement different parallel programming 
> models.  There are others out there that would vehemently disagree with that 
> approach, and their objections are understandable.

I agree! :-)

-Filip


>  
> 
> _______________________________________________
> webkit-dev mailing list
> [email protected]
> https://lists.webkit.org/mailman/listinfo/webkit-dev

_______________________________________________
webkit-dev mailing list
[email protected]
https://lists.webkit.org/mailman/listinfo/webkit-dev

Re: [webkit-dev] Parallel JavaScript: Why a separate ParallelArray types

Reply via email to