On Sun, Aug 26, 2012 at 9:45 AM, Bruce Momjian wrote:
> On Thu, Dec 29, 2011 at 11:37:22AM +0900, Manabu Ori wrote:
>> > > a configure test only proves whether the build machine can deal
>> > > with the flag, not whether the machine the executables will
>> > > ultimately run on knows what the flag
On Thu, Dec 29, 2011 at 11:37:22AM +0900, Manabu Ori wrote:
> > > a configure test only proves whether the build machine can deal
> > > with the flag, not whether the machine the executables will
> > > ultimately run on knows what the flag means. We cannot assume that
> > > the build and execution
Heikki Linnakangas writes:
> The Linux kernel does this (arch/powerpc/include/asm/ppc-opcode.h):
Yeah, I was looking at that too.
> We can't copy-paste code from Linux directly, and I'm not sure I like
> that particular phrasing of the macro, but perhaps we should steal the
> idea and only use
On 29.12.2011 04:36, Manabu Ori wrote:
I believe lwarx hint would be no harm for recent PowerPC processors.
What I tested are:
(1) Built postgres on POWER6 + RHEL5, which got lwarx hint
included. Then copy these src tree to POWER5 + RHEL4 and
run "make test", finished successful
> > a configure test only proves whether the build machine can deal
> > with the flag, not whether the machine the executables will
> > ultimately run on knows what the flag means. We cannot assume that
> > the build and execution boxes are the same. (In general,
> > AC_TRY_RUN tests are best avo
2011/12/29 Tatsuo Ishii
> > Impressive results.
> >
> > config/c-compiler.m4 doesn't seem like the right place for the
> > configure test. Would there be any harm in setting the lwarx hint
> > always; what would happen on older ppc processors that don't support
> > it?
>
> I think the load module
OT:
Please use mail address "manabu@gmail.com", not
"manabu@gmailc.com" when following this thread. I accidently made
a mistake when I posted the first mail in this thread.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp
Tom Lane wrote:
> a configure test only proves whether the build machine can deal
> with the flag, not whether the machine the executables will
> ultimately run on knows what the flag means. We cannot assume that
> the build and execution boxes are the same. (In general,
> AC_TRY_RUN tests are
Heikki Linnakangas writes:
> config/c-compiler.m4 doesn't seem like the right place for the configure
> test. Would there be any harm in setting the lwarx hint always; what
> would happen on older ppc processors that don't support it?
More to the point, a configure test only proves whether the
> On 28.12.2011 14:03, Tatsuo Ishii wrote:
With help from IBM Japan Ltd. we did some tests on a larger IBM
machine than Tom Lane has used for his
test(http://archives.postgresql.org/message-id/8292.1314641...@sss.pgh.pa.us).
In his case it was IBM 8406-71Y, which has 8 physical
On 28.12.2011 14:03, Tatsuo Ishii wrote:
With help from IBM Japan Ltd. we did some tests on a larger IBM
machine than Tom Lane has used for his
test(http://archives.postgresql.org/message-id/8292.1314641...@sss.pgh.pa.us).
In his case it was IBM 8406-71Y, which has 8 physical cores and
4SMT(32 th
>> With help from IBM Japan Ltd. we did some tests on a larger IBM
>> machine than Tom Lane has used for his
>> test(http://archives.postgresql.org/message-id/8292.1314641...@sss.pgh.pa.us).
>> In his case it was IBM 8406-71Y, which has 8 physical cores and
>> 4SMT(32 threadings). Ours is IBM Power
On Tue, Oct 18, 2011 at 2:20 AM, Pavan Deolasee
wrote:
> On Tue, Oct 18, 2011 at 10:04 AM, Robert Haas wrote:
>> Hmm, so you added the non-locked test in TAS()? Did you try adding it
>> just to TAS_SPIN()? On Itanium, I found that it was slightly better
>> to do it only in TAS_SPIN() - i.e. in
On Tue, Oct 18, 2011 at 10:04 AM, Robert Haas wrote:
> Hmm, so you added the non-locked test in TAS()? Did you try adding it
> just to TAS_SPIN()? On Itanium, I found that it was slightly better
> to do it only in TAS_SPIN() - i.e. in the contended case.
>
Would it be a good change for S_LOCK(
>> With help from IBM Japan Ltd. we did some tests on a larger IBM
>> machine than Tom Lane has used for his
>> test(http://archives.postgresql.org/message-id/8292.1314641...@sss.pgh.pa.us).
>> In his case it was IBM 8406-71Y, which has 8 physical cores and
>> 4SMT(32 threadings). Ours is IBM Power
On Tue, Oct 18, 2011 at 12:11 AM, Tatsuo Ishii wrote:
>>> That would be great. What I've been using as a test case is pgbench
>>> -S -c $NUM_CPU_CORES -j $NUM_CPU_CORES with scale factor 100 and
>>> shared_buffers=8GB.
>>>
>>> I think what you'd want to compare is the performance of unpatched
>>>
>> That would be great. What I've been using as a test case is pgbench
>> -S -c $NUM_CPU_CORES -j $NUM_CPU_CORES with scale factor 100 and
>> shared_buffers=8GB.
>>
>> I think what you'd want to compare is the performance of unpatched
>> master, vs. the performance with this line added to s_lock.
> That would be great. What I've been using as a test case is pgbench
> -S -c $NUM_CPU_CORES -j $NUM_CPU_CORES with scale factor 100 and
> shared_buffers=8GB.
>
> I think what you'd want to compare is the performance of unpatched
> master, vs. the performance with this line added to s_lock.h for
On Tue, Sep 6, 2011 at 4:33 AM, Tatsuo Ishii wrote:
> I am interested in this thread because I may be able to borrow a big
> IBM machine and might be able to do some tests on it if it somewhat
> contributes enhancing PostgreSQL. Is there anything I can do for this?
That would be great. What I've
Hi,
I am interested in this thread because I may be able to borrow a big
IBM machine and might be able to do some tests on it if it somewhat
contributes enhancing PostgreSQL. Is there anything I can do for this?
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japa
I wrote:
> No I/O anywhere. I'm thinking the reported idle time must correspond to
> spinlock delays that are long enough to reach the select() calls in
> s_lock. If so, 38% is depressingly high, but it's not out of line with
> what we've seen in the past in tests designed to provoke spinlock
> c
On Tue, Aug 30, 2011 at 7:21 PM, Tom Lane wrote:
> Robert Haas writes:
>> On Tue, Aug 30, 2011 at 6:33 PM, Tom Lane wrote:
>>> I ran it up to "pgbench -c 200 -j 200 -S -T 300 bench" and still see
>>> vmstat numbers around 50% user time, 12% system time, 38% idle.
>>> So no lseek problem here, bo
Robert Haas writes:
> On Tue, Aug 30, 2011 at 6:33 PM, Tom Lane wrote:
>> I ran it up to "pgbench -c 200 -j 200 -S -T 300 bench" and still see
>> vmstat numbers around 50% user time, 12% system time, 38% idle.
>> So no lseek problem here, boss. Kernel calls itself 2.6.32-192.el6.x86_64.
> Eh, wa
On Tue, Aug 30, 2011 at 6:33 PM, Tom Lane wrote:
> Robert Haas writes:
>> On Tue, Aug 30, 2011 at 4:37 PM, Tom Lane wrote:
If this is on Linux, I am surprised
that you didn't get killed by the lseek() contention problem on a
machine with that many cores.
>
>>> Hm ... now that you
Robert Haas writes:
> On Tue, Aug 30, 2011 at 4:37 PM, Tom Lane wrote:
>>> If this is on Linux, I am surprised
>>> that you didn't get killed by the lseek() contention problem on a
>>> machine with that many cores.
>> Hm ... now that you mention it, all of these tests have been using
>> the late
On Tue, Aug 30, 2011 at 4:37 PM, Tom Lane wrote:
>> If this is on Linux, I am surprised
>> that you didn't get killed by the lseek() contention problem on a
>> machine with that many cores.
>
> Hm ... now that you mention it, all of these tests have been using
> the latest-and-greatest unreleased
Robert Haas writes:
> I am a bit surprised by your test results, because I also tried x86_64
> with an unlocked test, also on pgbench -S, and I am pretty sure I got
> a regression. Maybe I'll try rerunning that. It seems possible that
> the x86_64 results depend on the particular sub-architectur
On Tue, Aug 30, 2011 at 4:05 PM, Tom Lane wrote:
> This suggests that (1) an unlocked test in TAS_SPIN might be a good idea
> on x86_64 after all, and (2) this test scenario may not be pushing the
> system hard enough to expose limitations of the spinlock implementation.
>
> I am now thinking that
I wrote:
> I am hoping to do a similar test on another machine with $bignum Xeon
> processors, to see if Intel hardware reacts any differently. But that
> machine is in the Westford office which is currently without power,
> so it will have to wait a few days.
OK, the lights are on again in Westf
Ants Aasma writes:
> On Mon, Aug 29, 2011 at 10:12 PM, Tom Lane wrote:
>> Also, if the PPC machine really is hyperthreaded (the internal webpage
>> for it says "Hyper? True" but /proc/cpuinfo doesn't provide any clear
>> indications), that might mean it's not going to scale too well past 16x
>> t
Sorry, forgot to cc the list.
On Mon, Aug 29, 2011 at 10:12 PM, Tom Lane wrote:
> Also, if the PPC machine really is hyperthreaded (the internal webpage
> for it says "Hyper? True" but /proc/cpuinfo doesn't provide any clear
> indications), that might mean it's not going to scale too well past 16
"Kevin Grittner" writes:
> Robert Haas wrote:
>> Stepping beyond the immediate issue of whether we want an unlocked
>> test in there or not (and I agree that based on these numbers we
>> don't), there's a clear and puzzling difference between those sets
>> of numbers. The Opteron test is showing
Robert Haas wrote:
> Stepping beyond the immediate issue of whether we want an unlocked
> test in there or not (and I agree that based on these numbers we
> don't), there's a clear and puzzling difference between those sets
> of numbers. The Opteron test is showing 32 clients getting about
> 23
Greg Stark writes:
> I was going to say the same thing as Tom that sequence points and
> volatile pointers have nothing at all to do with each other. However
> my brief searching online actually seemed to indicate that in fact the
> compiler isn't supposed to reorder volatile memory accesses acros
On Mon, Aug 29, 2011 at 2:15 PM, Tom Lane wrote:
> These tests were run on a 32-CPU Opteron machine (Sun Fire X4600 M2,
> 8 quad-core sockets). Test conditions the same as my IA64 set, except
> for the OS and the -j switches:
>
> Stock git head:
>
> pgbench -c 1 -j 1 -S -T 300 bench tps = 9
Robert Haas writes:
> I'm actually not convinced that we're entirely consistent here about
> what we require the semantics of acquiring and releasing a spinlock to
> be. For example, on x86 and x86_64, we acquire the lock using xchgb,
> which acts a full memory barrier. But when we release the l
I wrote:
> I am also currently running tests on x86_64 and PPC using Red Hat test
> machines --- expect results later today.
OK, I ran some more tests. These are not directly comparable to my
previous results with IA64, because (a) I used RHEL6.2 and gcc 4.4.6;
(b) I used half as many pgbench thr
On Mon, Aug 29, 2011 at 1:24 PM, Tom Lane wrote:
> Robert Haas writes:
>> This discussion seems to miss the fact that there are two levels of
>> reordering that can happen. First, the compiler can move things
>> around. Second, the CPU can move things around.
>
> Right, I think that's exactly t
On Mon, Aug 29, 2011 at 5:53 PM, Robert Haas wrote:
> Even though the compiler may emit those instructions in exactly that
> order, an x86 CPU can, IIUC, decide to load B before it finishes
> storing A, so that the actual apparent execution order as seen by
> other CPUs will be either the above,
Robert Haas writes:
> This discussion seems to miss the fact that there are two levels of
> reordering that can happen. First, the compiler can move things
> around. Second, the CPU can move things around.
Right, I think that's exactly the problem with the previous wording of
that comment; it d
On Mon, Aug 29, 2011 at 12:00 PM, Tom Lane wrote:
> Greg Stark writes:
>> The confusion for me is that it's talking about sequence points and
>> volatile pointers in the same breath as if one implies the other.
>> Making something a volatile pointer dose not create a sequence point.
>> It require
Greg Stark writes:
> The confusion for me is that it's talking about sequence points and
> volatile pointers in the same breath as if one implies the other.
> Making something a volatile pointer dose not create a sequence point.
> It requires that the compiler not move the access or store across a
On Mon, Aug 29, 2011 at 4:07 PM, Tom Lane wrote:
>> * ANOTHER CAUTION: be sure that TAS(), TAS_SPIN(), and
>> S_UNLOCK() represent
>> * sequence points, ie, loads and stores of other values must not be
>> moved
>> * across a lock or unlock. In most cases it suffices to make
>>
On Mon, Aug 29, 2011 at 11:42 AM, Tom Lane wrote:
> Robert Haas writes:
>> On Mon, Aug 29, 2011 at 11:07 AM, Tom Lane wrote:
>>> Robert Haas writes:
IIUC, this is basically total nonsense.
>
>>> It could maybe be rewritten for more clarity, but it's far from being
>>> nonsense. The respon
Robert Haas writes:
> On Mon, Aug 29, 2011 at 11:07 AM, Tom Lane wrote:
>> Robert Haas writes:
>>> IIUC, this is basically total nonsense.
>> It could maybe be rewritten for more clarity, but it's far from being
>> nonsense. The responsibility for having an actual hardware memory fence
>> inst
On Mon, Aug 29, 2011 at 11:07 AM, Tom Lane wrote:
> Robert Haas writes:
>> OK, done. I think while we're tidying up here we ought to do
>> something about this comment:
>
>> * ANOTHER CAUTION: be sure that TAS(), TAS_SPIN(), and
>> S_UNLOCK() represent
>> * sequence points, ie, loads
Robert Haas writes:
> OK, done. I think while we're tidying up here we ought to do
> something about this comment:
> * ANOTHER CAUTION: be sure that TAS(), TAS_SPIN(), and
> S_UNLOCK() represent
> * sequence points, ie, loads and stores of other values must not be
> moved
> *
On Sun, Aug 28, 2011 at 8:00 PM, Tom Lane wrote:
> Robert Haas writes:
>> On Sun, Aug 28, 2011 at 7:19 PM, Tom Lane wrote:
>>> (IOW, +1 for inventing a second macro to use in the delay loop only.)
>
>> Beautiful. Got a naming preference for that second macro? I
>> suggested TAS_SPIN() because
2011/8/28 pasman pasmański :
> Pity that this patch works only on hpux :(.
Well, not really. x86 is already well-behaved. On a 32-core x86 box
running Linux, performs seems to plateau and level off, and then fall
off gradually. But on ia64, performance just collapses after about 24
cores. The
Robert Haas writes:
> On Sun, Aug 28, 2011 at 7:19 PM, Tom Lane wrote:
>> (IOW, +1 for inventing a second macro to use in the delay loop only.)
> Beautiful. Got a naming preference for that second macro? I
> suggested TAS_SPIN() because it's what you use when you spin, as
> opposed to what you
On Sun, Aug 28, 2011 at 7:19 PM, Tom Lane wrote:
> So this pretty well confirms Robert's results, in particular that all of
> the win from an unlocked test comes from using it in the delay loop.
> Given the lack of evidence that a general change in TAS() is beneficial,
> I'm inclined to vote again
I wrote:
> Yeah, I figured out that was probably what you meant a little while
> later. I found a 64-CPU IA64 machine in Red Hat's test labs and am
> currently trying to replicate your results; report to follow.
OK, these results are on a 64-processor SGI IA64 machine (AFAICT, 64
independent sock
Robert Haas writes:
> On Sun, Aug 28, 2011 at 11:35 AM, Tom Lane wrote:
>> Robert Haas writes:
>>> Then, I did this:
>>>
>>> - while (TAS(lock))
>>> + while (*lock ? 1 : TAS(lock))
>> Er, what? That sure looks like a manual application of what you'd
>> already done in the TAS macr
On Sun, Aug 28, 2011 at 11:35 AM, Tom Lane wrote:
> Robert Haas writes:
>> First, I did this:
>
>> -#define TAS(lock) _Asm_xchg(_SZ_W, lock, 1, _LDHINT_NONE)
>> +#define TAS(lock) (*(lock) ? 1 : _Asm_xchg(_SZ_W, lock, 1, _LDHINT_NONE))
>
> Seems reasonable, and similar to x86 logic.
>
>> Then, I
Robert Haas writes:
> First, I did this:
> -#define TAS(lock) _Asm_xchg(_SZ_W, lock, 1, _LDHINT_NONE)
> +#define TAS(lock) (*(lock) ? 1 : _Asm_xchg(_SZ_W, lock, 1, _LDHINT_NONE))
Seems reasonable, and similar to x86 logic.
> Then, I did this:
> - while (TAS(lock))
> + while (*lock
Pity that this patch works only on hpux :(.
But i have an idea: maybe when executor stop at locked row, it should
process next row instead of wait.
Of course if query not contain "order by" or windowing functions.
--
pasman
--
Sent via pgsql-hackers mailing list (pgsql-hackers@po
I was able to obtain access to a 32-core HP-UX server. I repeated the
pgbench -S testing that I have previously done on Linux, and found
that the results were not too good. Here are the results at scale
factor 100, on 9.2devel, with various numbers of clients. Five minute
runs, shared_buffers=8G
57 matches
Mail list logo