------- Forwarded message -------
From: nigelsande...@btconnect.com
To: "Dave Whipp - dave_wh...@yahoo.com" <+nntp+browseruk+2dcf7cf254.dave_whipp#yahoo....@spamgourmet.com>, "Dave Whipp - d...@whipp.name" <+nntp+browseruk+e66dbbe0cf.dave#whipp.n...@spamgourmet.com>
Cc:
Subject: Re: Parallelism and Concurrency was Re: Ideas for a"Object-Belongs-to-Thread" (nntp: message 4 of 20) threading model (nntp: message 20 of 20 -lastone!-) (nntp: message 13 of 20) (nntp: message 1 of 20)
Date: Tue, 18 May 2010 00:38:43 +0100

On Mon, 17 May 2010 23:25:07 +0100, Dave Whipp - dave_wh...@yahoo.com
<+nntp+browseruk+2dcf7cf254.dave_whipp#yahoo....@spamgourmet.com> wrote:

Thanks for the clarification. However, I would point out that the whose purpose of CUDA is to define an abstraction (using terms like threads and warps) that expose the GPU "internal operations" to the applications programmer as somewhere between SIMD and MIMD (hence the term SIMT -- single instruction, multiple thread: CUDA programmers write single-threaded, non-vectorized, code that the compiler then scales up)

Yeah! But that's just another example of stupid overloading of terminology
confusing things. And that's not just my opinion.

For example, see: http://www.anandtech.com/show/2556/5

"NVIDIA wanted us to push some ridiculous acronym for their SM's
architecture: SIMT (single instruction multiple thread). First off, this
is a confusing descriptor based on the normal understanding of
instructions and threads. But more to the point, there already exists a
programming model that nicely fits what NVIDIA and AMD are both actually
doing in hardware: SPMD, or single program multiple data. This description
is most often attached to distributed memory systems and large scale
clusters, but it really is actually what is going on here."



Also, if we assume that users will be writing their own operators, which will be auto-vectorized via hyper operators, then the idea that hyper's will seamlessly map to a SIMD core is probably somewhat optimistic.

Well, I was talking about the obviously vectorizable operations: array
addition, multiplication & scaling etc.

But I agree with you, for anything more complex than simple SIMD
operations, it is probably optimistic that the hyper-operators could
successfully map them to GPGPU functionality transparently. But then we
fall back to my earlier point that (certianly the more complex uses of)
CUDA and OpenCL would require a (non-core) module. Essentially a thin
wrapper around their APIs.

I see three domains of threading: the passive parallelism of GPUs; the slight parallelism (ignoring ILP) of multicore; and the massive parallelism of cloud computing. Each of these domains has its own jargon that includes some concept of "thread". I know that it's hopelessly naive, but it would be nice if the basic Perl6 threading abstractions were applicable to all three (not because there's a common implementation, but because I wouldn't need to learn three different language dialects).

I too find it unlikely that it would be possible to accommodate all those
three into a single cohesive semantic view.

Erlang's processes & messages view is perhaps the closest yet. But Erlang
started life targeting relatively simple, single-core, single-tasking,
embedded systems in telephone exchanges, where kernel-threading simply was
not available. It's Raison d'être was IO-bound communications applications
running on networks of simple processors. But the only way forward to
sustain Moore's Law is cpu parallelism, and that required them to add
kernel threading to achieve scale. They're just now beginning the process
of working out how to manage the interactions of multiple concurrent
user-space schedulers running under a kernel scheduler.

I personally find CPU parallelism to be the least interesting of the three.

Least interesting (to you) maybe, but it is now the most prevalent domain.
And that will become more and more true in the immediate and foreseeable
future. It is also the most relevant domain for the majority of the
applications that are the natural target of dynamic languages.

Developers of photo-realistic games, audio, video and image processing
applications will continue to use compiled languages for the foreseeable
future. It's possible to envisage that such things could be written in
dynamic languages (at the top level), by utilising compiled modules to do
the heavy lifting.

But no matter how much interpreter performance improves,
compiled-to-native code will always win on pure performance stakes. And
with gaming technology, the more performance you have, the more realistic
the game and the more you can do. And I see no sign of that topping out in
favour of programmer efficiency any time soon. It's quite amazing what can
be done in Java these days with HotSpot JIT, kernel threading, the vast
improvements in GC algorithms and wait-free data structures, but that has
taken them a looong time to achieve and it's still not enough for the
hardcore games industry.

buk.

Reply via email to