Re: [X10-users] Running X10 on GPUs, Cell, and Larrabee

Vijay Saraswat Mon, 20 Apr 2009 04:48:11 -0700

Thanks for your message.

The core functionality that X10 provides -- places, asynchronous 
execution and a framework to control asynchrony -- can be used to 
support programmer controlled caches as in the Cell.

Last year in a project internal to IBM we ran an X10 Monte Carlo style 
program on a Cell blade and showed efficiency comparable to C. The 
programmer used finish, async etc to move data back and forth between 
memory and CPU. The biggest issue we faced in getting performance was 
the need for the programmer to use SIMD operations to extract 
performance out of SPUs.

This year we are aiming to have X10 programs (such as programs for 
Black-Scholes, some machine learning kernels) running on NVidia GPUs 
with performance comparable to native CUDA performance.

We share with you the vision of using essentially the same source 
program to deliver good performance on Cell blades, GPUs, multicores and 
clusters. This is the big potential for X10 and the Asynchronous 
Partitioned Global Address Space (APGAS) programming model.

Remote asyncs are essentially messages -- messages with a code pointer 
specifying what code must be executed on the remote side when the 
message arises (i.e. active messages).

If one were to implement the join calculus on top of X10 it would be 
best to restrict attention so that a reduction references only elements 
in a single place. This is the underlying philosophy in X10 -- 
synchronous actions should be confined to a single place as far as 
possible. Thus atomic operations are permitted to access only mutable 
locations in the current place. (We have designed multi-place atomics, 
with statically determined bounded set of places, but not yet 
implemented them.)

Olivier Pernet wrote:
> Hi all,
>
> I've been doing some thinking about where processor architecture are
> headed, and the programming models required.
> It seem clear to me that neither shared mutable memory, nor cache
> coherency can scale up to hundreds of cores. Hence future processors
> will have to look very similar to today's GPUs, or some hybrid thing
> like the Cell. Intel is going in the same direction with Larrabee,
> although I believe that its use of cache coherency won't survive for
> very long, probably not above the 100 cores mark.
>
> It seems like the future architectures will need to include both:
> - memory partitioned across cores
> - programmer-controlled cache
>
> X10 is ideally equipped for the former thanks to PGAS. How about the latter?
>
> Is there any ongoing work on a Cell or GPU runtime for X10? I think a
> truly future-proof language, at this point, should enable good
> performance for the same program on all of Cell, GPUs, single-chip
> multicore CPU, and clusters.
>
> As an aside, has explicit message passing been considered in X10 for
> communication between places? I suppose futures are an equivalent
> primitive but I'm partial to message passing as in the actor model,
> and to the join calculus (although I don't know if it can be
> implemented efficiently across a cluster of machines).
>
> Feel free to point at my mistakes or redirect the conversation to
> another list if appropriate.
>
> Cheers,
> --
> Olivier Pernet
>
> We are the knights who say
> echo '16i[q]sa[ln0=aln100%Pln100/snlbx]sbA0D4D465452snlbxq'|dc
>
> ------------------------------------------------------------------------------
> Stay on top of everything new and different, both inside and 
> around Java (TM) technology - register by April 22, and save
> $200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco.
> 300 plus technical and hands-on sessions. Register today. 
> Use priority code J9JMT32. http://p.sf.net/sfu/p
> _______________________________________________
> X10-users mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/x10-users
>
>
>
>   

------------------------------------------------------------------------------
Stay on top of everything new and different, both inside and 
around Java (TM) technology - register by April 22, and save
$200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco.
300 plus technical and hands-on sessions. Register today. 
Use priority code J9JMT32. http://p.sf.net/sfu/p
_______________________________________________
X10-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/x10-users

Re: [X10-users] Running X10 on GPUs, Cell, and Larrabee

Reply via email to