On Fri, Nov 02, 2007 at 11:09:33PM -0700, Ian Lance Taylor wrote:
skaller [EMAIL PROTECTED] writes:
As I said before, the register is only stolen for code which actually
uses TLS.
So scanning that document, for x86_64, fs is used in startup
code, presumably if, and only if, there
skaller wrote :
I can tell you I definitely considered using FS for the
Felix thread frame pointer to save passing that pointer
between every function..
But then, won't you end up with an implementation very similar
to __thread??
--
Sylvain Pion
INRIA Sophia-Antipolis
Geometrica Project-Team
On Sat, 2007-11-03 at 10:35 +0100, Sylvain Pion wrote:
skaller wrote :
I can tell you I definitely considered using FS for the
Felix thread frame pointer to save passing that pointer
between every function..
But then, won't you end up with an implementation very similar
to __thread??
On Fri, Nov 02, 2007 at 07:39:33AM -0700, Ian Lance Taylor wrote:
The only way I can interpret your comments is that you are assuming
that all TLS is Global Dynamic (e.g., accessed from a dlopen'ed shared
library). But stack based thread local storage won't work for
dlopen'ed shared libraries
skaller [EMAIL PROTECTED] writes:
A really cool (non-Posix) implementation would put TLS globals
on the stack base .. but this does require at least one extra
machine register in languages like C which don't provide
a static display (pointer to parent function). For languages
that do, such
On Fri, 2007-11-02 at 07:39 -0700, Ian Lance Taylor wrote:
skaller [EMAIL PROTECTED] writes:
In a C executable, TLS requires one extra machine register.
You mean gcc?
TLS
variables are accessed via offsets from that register. So what's the
significant difference between that and your
On Fri, 2007-11-02 at 10:46 -0400, Daniel Jacobowitz wrote:
On Fri, Nov 02, 2007 at 07:39:33AM -0700, Ian Lance Taylor wrote:
The only way I can interpret your comments is that you are assuming
that all TLS is Global Dynamic (e.g., accessed from a dlopen'ed shared
library). But stack
On Thu, 2007-11-01 at 21:02 -0700, Gary Funck wrote:
On Thu, Oct 18, 2007 at 11:42:52AM +1000, skaller wrote:
DO you know how thread local variables are handled?
[Not using Posix TLS I hope .. that would be a disaster]
Would you please elaborate?
Sure ..
What's wrong with the
skaller [EMAIL PROTECTED] writes:
On Fri, 2007-11-02 at 07:39 -0700, Ian Lance Taylor wrote:
skaller [EMAIL PROTECTED] writes:
In a C executable, TLS requires one extra machine register.
You mean gcc?
I don't understand the question. I mean in a C/C++ executable which
uses TLS. By
On Sat, Nov 03, 2007 at 03:31:14AM +1100, skaller wrote:
On Fri, 2007-11-02 at 07:39 -0700, Ian Lance Taylor wrote:
I think you need to look at the TLS access code before deciding that
it has bad performance.
You already said it costs a register? That's a REALLY high cost
to pay to
On Sat, Nov 03, 2007 at 03:38:51AM +1100, skaller wrote:
My argument is basically: there is no need for any such
feature in a well written program. Each thread already has
its own local stack. Global variables should not be used
in the first place (except for signals etc where
there is no
Olivier Galibert wrote:
On Sat, Nov 03, 2007 at 03:38:51AM +1100, skaller wrote:
My argument is basically: there is no need for any such
feature in a well written program. Each thread already has
its own local stack. Global variables should not be used
in the first place (except for signals etc
On Sat, 3 Nov 2007, skaller wrote:
On Fri, 2007-11-02 at 10:46 -0400, Daniel Jacobowitz wrote:
On Fri, Nov 02, 2007 at 07:39:33AM -0700, Ian Lance Taylor wrote:
The only way I can interpret your comments is that you are assuming
that all TLS is Global Dynamic (e.g., accessed from a dlopen'ed
On Fri, 2007-11-02 at 10:29 -0700, Ian Lance Taylor wrote:
skaller [EMAIL PROTECTED] writes:
On Fri, 2007-11-02 at 07:39 -0700, Ian Lance Taylor wrote:
skaller [EMAIL PROTECTED] writes:
In a C executable, TLS requires one extra machine register.
You mean gcc?
I don't
On Fri, 2007-11-02 at 19:56 +0100, Olivier Galibert wrote:
On Sat, Nov 03, 2007 at 03:31:14AM +1100, skaller wrote:
On Fri, 2007-11-02 at 07:39 -0700, Ian Lance Taylor wrote:
I think you need to look at the TLS access code before deciding that
it has bad performance.
You already
On Fri, 2007-11-02 at 20:00 +0100, Olivier Galibert wrote:
On Sat, Nov 03, 2007 at 03:38:51AM +1100, skaller wrote:
My argument is basically: there is no need for any such
feature in a well written program. Each thread already has
its own local stack. Global variables should not be used
This is not true. If you use a register for any purpose like this,
it can't be used for anything else and that has a cost.
This is a segment register. Please go and read about what segment
registers. They are not real registers and cannot be used for
anything except memory accesses. They
On Fri, 2007-11-02 at 15:31 -0400, Robert Dewar wrote:
Olivier Galibert wrote:
There are lots of cases where global thread specific variables
are useful in practice, ask anyone who has programmed real world
large scale real time embedded programs.
No. And I have done just that myself. There
skaller [EMAIL PROTECTED] writes:
On Fri, 2007-11-02 at 10:29 -0700, Ian Lance Taylor wrote:
skaller [EMAIL PROTECTED] writes:
On Fri, 2007-11-02 at 07:39 -0700, Ian Lance Taylor wrote:
skaller [EMAIL PROTECTED] writes:
In a C executable, TLS requires one extra machine
On Sat, 2007-11-03 at 12:27 +1100, skaller wrote:
On Fri, 2007-11-02 at 10:29 -0700, Ian Lance Taylor wrote:
Of course there is. It's called design by contract.
I do it all the time. I am appalled at code bases like
GTK and interfaces like OpenMP which get such really
basic things wrong.
On Fri, 2007-11-02 at 18:45 -0700, Andrew Pinski wrote:
This is not true. If you use a register for any purpose like this,
it can't be used for anything else and that has a cost.
This is a segment register. Please go and read about what segment
registers.
I know how the x86 works quite
skaller wrote:
This is not true. If you use a register for any purpose like this,
it can't be used for anything else and that has a cost.
On x86_64 which I use, every register is valuable. Don't you dare
take one away, it would have a serious performance impact AND
it would stop ME using that
skaller [EMAIL PROTECTED] writes:
Neko, for example, uses a register. AFAIK MLton does the
same kind of thing. If gcc team thinks ANY register is free
to steal they'd be wrong -- that doesn't mean it shouldn't
be used, just that it definitely is NOT free.
To be clear, it is not the gcc team
On Fri, 2007-11-02 at 23:56 -0400, Robert Dewar wrote:
skaller wrote:
You really can't be serious in your comment about fs, if you
understand the architecture ...
You're just not thinking the same way I am. A CPU has state,
the compiler and application program manage that state.
If the
On Fri, 2007-11-02 at 22:35 -0700, Ian Lance Taylor wrote:
skaller [EMAIL PROTECTED] writes:
Neko, for example, uses a register. AFAIK MLton does the
same kind of thing. If gcc team thinks ANY register is free
to steal they'd be wrong -- that doesn't mean it shouldn't
be used, just
skaller [EMAIL PROTECTED] writes:
As I said before, the register is only stolen for code which actually
uses TLS.
So scanning that document, for x86_64, fs is used in startup
code, presumably if, and only if, there is a linker section
containing __thread variables?
Yes.
Ian
On Thu, Oct 18, 2007 at 11:42:52AM +1000, skaller wrote:
DO you know how thread local variables are handled?
[Not using Posix TLS I hope .. that would be a disaster]
Would you please elaborate? What's wrong with the
POSIX TLS implementation? Do you know of any studies?
I ask, because we
I'm not sure what OpenMP spec says about default data scope (too lazy
to read through), but it seems that examples from
http://kallipolis.com/openmp/2.html assume default(private), while GCC
GOMP defaults to shared. In your case,
#pragma omp parallel for shared(A, row, col)
for (i = k+1;
On Sat, 2007-10-20 at 22:32 +0400, Tomash Brechko wrote:
I'm not sure what OpenMP spec says about default data scope (too lazy
to read through),
but it seems that examples from
http://kallipolis.com/openmp/2.html assume default(private), while GCC
GOMP defaults to shared. In your case,
skaller wrote:
OK, attached.
Hi skaller,
I think I've wasted my money. They do not ship OpenMP headers and libs
with Standard Edition. :(
Best Regards,
Biplab
Hi All,
I did some tests with GCC-4.2.2 (MinGW build) and the source code
provided by skaller.
The compilation log is as follows.
-- Build: Release in Test ---
[ 50.0%] mingw32-gcc.exe -Wall -fexceptions -fopenmp -O2
-IC:\MinGW\include -c
On Thu, Oct 18, 2007 at 02:47:44PM +1000, skaller wrote:
On Thu, 2007-10-18 at 12:02 +0800, Biplab Kumar Modak wrote:
skaller wrote:
On Wed, 2007-10-17 at 18:14 +0100, Biagio Lucini wrote:
skaller wrote:
It would be interesting to try with another compiler. Do you have access
skaller wrote:
On Thu, 2007-10-18 at 12:02 +0800, Biplab Kumar Modak wrote:
skaller wrote:
On Wed, 2007-10-17 at 18:14 +0100, Biagio Lucini wrote:
skaller wrote:
It would be interesting to try with another compiler. Do you have access
to another OpenMP-enabled
On Thu, 2007-10-18 at 06:00 -0700, Tim Prince wrote:
skaller wrote:
I don't know of any OpenMP compiler which would correct the nesting of
parallel loops in your LU. I have assumed that OpenMP doesn't allow
such optimization; you have to get it right yourself.
Can you explain? This code
On Thu, 2007-10-18 at 13:04 +0200, Jakub Jelinek wrote:
On Thu, Oct 18, 2007 at 02:47:44PM +1000, skaller wrote:
On LU_mp.c according to oprofile more than 95% of time is spent in the inner
loop, rather than any kind of waiting. On quad core with OMP_NUM_THREADS=4
all 4 threads eat 99.9% of
skaller wrote:
On Thu, 2007-10-18 at 06:00 -0700, Tim Prince wrote:
skaller wrote:
I don't know of any OpenMP compiler which would correct the nesting of
parallel loops in your LU. I have assumed that OpenMP doesn't allow
such optimization; you have to get it right yourself.
On 19 October 2007 02:45, tim prince wrote:
skaller wrote:
On Thu, 2007-10-18 at 06:00 -0700, Tim Prince wrote:
skaller wrote:
I don't know of any OpenMP compiler which would correct the nesting of
parallel loops in your LU. I have assumed that OpenMP doesn't allow
such
Hi, I have just run and timed a couple of tutorial examples for
openMP using gcc (GCC) 4.2.1 (Ubuntu 4.2.1-5ubuntu4) on a dual core
Athlon amd64, with OMP_NUM_THREADS set to 1 and 2, and occasionally
8 I found that 1 thread outperforms 2 by almost 2:1 on all the examples,
and 8 is only
On Thu, Oct 18, 2007 at 03:00:02AM +1000, skaller wrote:
Hi, I have just run and timed a couple of tutorial examples for
openMP using gcc (GCC) 4.2.1 (Ubuntu 4.2.1-5ubuntu4) on a dual core
Athlon amd64, with OMP_NUM_THREADS set to 1 and 2, and occasionally
8 I found that 1 thread outperforms
skaller wrote:
Hi, I have just run and timed a couple of tutorial examples for
openMP using gcc (GCC) 4.2.1 (Ubuntu 4.2.1-5ubuntu4) on a dual core
Athlon amd64, with OMP_NUM_THREADS set to 1 and 2, and occasionally
8 I found that 1 thread outperforms 2 by almost 2:1 on all the examples,
and 8
On Wed, 2007-10-17 at 18:14 +0100, Biagio Lucini wrote:
skaller wrote:
It would be interesting to try with another compiler. Do you have access
to another OpenMP-enabled compiler?
Unfortunately no, unless MSVC++ in VS2005 has openMP.
I have an Intel licence but they're too tied up with
On Wed, 2007-10-17 at 10:09 -0700, Joe Buck wrote:
On Thu, Oct 18, 2007 at 03:00:02AM +1000, skaller wrote:
Hi, I have just run and timed a couple of tutorial examples for
openMP using gcc (GCC) 4.2.1 (Ubuntu 4.2.1-5ubuntu4) on a dual core
Athlon amd64, with OMP_NUM_THREADS set to 1 and 2,
skaller writes:
Unfortunately no, unless MSVC++ in VS2005 has openMP.
I don't know if Visual C++ 2005 Express supports OpenMP, but the
Professional edition should. Alternatively, the free, as in beer,
Microsoft compiler included in the Windows SDK supports OpenMP.
On Wed, 2007-10-17 at 10:09 -0700, Joe Buck wrote:
On Thu, Oct 18, 2007 at 03:00:02AM +1000, skaller wrote:
Hi, I have just run and timed a couple of tutorial examples for
openMP using gcc (GCC) 4.2.1 (Ubuntu 4.2.1-5ubuntu4) on a dual core
Athlon amd64, with OMP_NUM_THREADS set to 1 and 2,
Ross Ridge wrote:
skaller writes:
Unfortunately no, unless MSVC++ in VS2005 has openMP.
I don't know if Visual C++ 2005 Express supports OpenMP, but the
Professional edition should. Alternatively, the free, as in beer,
Microsoft compiler included in the Windows SDK supports OpenMP.
Visual
skaller wrote:
On Wed, 2007-10-17 at 18:14 +0100, Biagio Lucini wrote:
skaller wrote:
It would be interesting to try with another compiler. Do you have access
to another OpenMP-enabled compiler?
Unfortunately no, unless MSVC++ in VS2005 has openMP.
I have an Intel licence but they're too
On Thu, 2007-10-18 at 12:02 +0800, Biplab Kumar Modak wrote:
skaller wrote:
On Wed, 2007-10-17 at 18:14 +0100, Biagio Lucini wrote:
skaller wrote:
It would be interesting to try with another compiler. Do you have access
to another OpenMP-enabled compiler?
Unfortunately no,
47 matches
Mail list logo