On Wed, Nov 21, 2012 at 7:10 AM, Ingo Molnar wrote:
>
> Because scalability slowdowns are often non-linear.
Only if you hold locks or have other non-cpu-private activity.
Which the vsyscall code really shouldn't have.
That said, it might be worth removing the "prefetchw(&mm->mmap_sem)"
from the
* Linus Torvalds wrote:
> On Wed, Nov 21, 2012 at 7:10 AM, Ingo Molnar wrote:
> >
> > Because scalability slowdowns are often non-linear.
>
> Only if you hold locks or have other non-cpu-private activity.
>
> Which the vsyscall code really shouldn't have.
Yeah, the faults accessing any sort
On Wed, 21 Nov 2012, Ingo Molnar wrote:
> Btw., what I did was to simply look at David's profile on the
> regressing system and I compared it to the profile I got on a
> pretty similar (but unfortunately not identical and not
> regressing) system. I saw 3 differences:
>
> - the numa emulation
On Wed, Nov 21, 2012 at 08:37:12PM +0100, Andrea Arcangeli wrote:
> Hi,
>
> On Wed, Nov 21, 2012 at 10:38:59AM +, Mel Gorman wrote:
> > HACKBENCH PIPES
> > 3.7.0 3.7.0 3.7.0
> >3.7.0 3.7.0
> >
Hi,
On Wed, Nov 21, 2012 at 10:38:59AM +, Mel Gorman wrote:
> HACKBENCH PIPES
> 3.7.0 3.7.0 3.7.0
> 3.7.0 3.7.0
>rc6-stats-v4r12 rc6-schednuma-v16r2rc6-autonuma-v28fastr3
>rc6-mo
* Linus Torvalds wrote:
> [...] And not look at vsyscalls or anything, but look at what
> schednuma does wrong!
I have started 4 independent lines of inquiry to figure out
what's wrong on David's system, and all four are in the category
of 'what does our tree do to cause a regression':
-
On 11/21/2012 12:02 PM, Linus Torvalds wrote:
The same is true of all your arguments about Mel's numbers wrt THP
etc. Your arguments are misleading - either intentionally, of because
you yourself didn't think things through. For schednuma, it's not
enough to be par with mainline with THP off - t
* Ingo Molnar wrote:
> So because I did not have an old-glibc system like David's, I
> did not know the actual page fault rate. If it is high enough
> then nonlinear effects might cause such effects.
>
> This is an entirely valid line of inquiry IMO.
Btw., when comparing against 'mainline' I
* Ingo Molnar wrote:
> This is an entirely valid line of inquiry IMO.
Btw., what I did was to simply look at David's profile on the
regressing system and I compared it to the profile I got on a
pretty similar (but unfortunately not identical and not
regressing) system. I saw 3 differences:
* Linus Torvalds wrote:
> On Mon, Nov 19, 2012 at 11:06 PM, Ingo Molnar wrote:
> >
> > Oh, finally a clue: you seem to have vsyscall emulation
> > overhead!
>
> Ingo, stop it already!
>
> This is *exactly* the kind of "blame everybody else than
> yourself" behavior that I was talking about e
On Mon, Nov 19, 2012 at 11:06 PM, Ingo Molnar wrote:
>
> Oh, finally a clue: you seem to have vsyscall emulation
> overhead!
Ingo, stop it already!
This is *exactly* the kind of "blame everybody else than yourself"
behavior that I was talking about earlier.
There have been an absolute *shitload
On Mon, Nov 19, 2012 at 11:37:01PM -0800, David Rientjes wrote:
> On Tue, 20 Nov 2012, Ingo Molnar wrote:
>
> > No doubt numa/core should not regress with THP off or on and
> > I'll fix that.
> >
> > As a background, here's how SPECjbb gets slower on mainline
> > (v3.7-rc6) if you boot Mel's ke
On Mon, Nov 19, 2012 at 07:41:16PM -0500, Rik van Riel wrote:
> On 11/19/2012 06:00 PM, Mel Gorman wrote:
> >On Mon, Nov 19, 2012 at 11:36:04PM +0100, Ingo Molnar wrote:
> >>
> >>* Mel Gorman wrote:
> >>
> >>>Ok.
> >>>
> >>>In response to one of your later questions, I found that I had
> >>>in fac
On Tue, Nov 20, 2012 at 11:40:53AM +0100, Ingo Molnar wrote:
>
> btw., mind sending me a fuller/longer profile than the one
> you've sent before? In particular does your system have any
> vsyscall emulation page fault overhead?
>
I can't, the results for specjbb got trashed after I moved to 3.
On Tue, Nov 20, 2012 at 10:20:10AM +, Mel Gorman wrote:
> I've added two extra configuration files to run specjbb single and multi
> JVMs with THP enabled. It takes about 1.5 to 2 hours to complete a single
1.5 to 2 hours if running to the full set of warehouses required for a
compliant run. C
btw., mind sending me a fuller/longer profile than the one
you've sent before? In particular does your system have any
vsyscall emulation page fault overhead?
If yes, does the patch below change anything for you?
Thanks,
Ingo
>
Subject: x86/vsyscall: Add Kconfig optio
> > Ingo, stop doing this kind of crap.
> >
> > Let's make it clear: if the NUMA patches continue to regress
> > performance for reasonable loads (and that very much includes
> > "no THP") then they won't be merged.
> >
> > You seem to be in total denial. Every time Mel sends out
> > results t
* David Rientjes wrote:
> On Tue, 20 Nov 2012, Ingo Molnar wrote:
>
> > > This happened to be an Opteron (but not 83xx series), 2.4Ghz.
> >
> > Ok - roughly which family/model from /proc/cpuinfo?
>
> It's close enough, it's 23xx.
Ok - which family/model number in /proc/cpuinfo?
I'm asking
On Tue, 20 Nov 2012, Ingo Molnar wrote:
> > This happened to be an Opteron (but not 83xx series), 2.4Ghz.
>
> Ok - roughly which family/model from /proc/cpuinfo?
>
It's close enough, it's 23xx.
> > It's perf top -U, the benchmark itself was unchanged so I
> > didn't think it was interesting
On Tue, 20 Nov 2012, Ingo Molnar wrote:
> > I confirm that numa/core regresses significantly more without
> > thp than the 6.3% regression I reported with thp in terms of
> > throughput on the same system. numa/core at 01aa90068b12
> > ("sched: Use the best-buddy 'ideal cpu' in balancing
> >
* David Rientjes wrote:
> I confirm that numa/core regresses significantly more without
> thp than the 6.3% regression I reported with thp in terms of
> throughput on the same system. numa/core at 01aa90068b12
> ("sched: Use the best-buddy 'ideal cpu' in balancing
> decisions") had 99389.49
On Mon, Nov 19, 2012 at 11:44 PM, Ingo Molnar wrote:
>
> * David Rientjes wrote:
>
>> On Tue, 20 Nov 2012, Ingo Molnar wrote:
>>
>> > > > numa/core at ec05a2311c35 ("Merge branch 'sched/urgent' into
>> > > > sched/core") had an average throughput of 136918.34
>> > > > SPECjbb2005 bops, which is a
* David Rientjes wrote:
> This is in comparison to my earlier perftop results which were with thp
> enabled. Keep in mind that this system has a NUMA configuration of
>
> $ cat /sys/devices/system/node/node*/distance
> 10 20 20 30
> 20 10 20 20
> 20 20 10 20
> 30 20 20 10
You could check wh
* David Rientjes wrote:
> On Tue, 20 Nov 2012, Ingo Molnar wrote:
>
> > > > numa/core at ec05a2311c35 ("Merge branch 'sched/urgent' into
> > > > sched/core") had an average throughput of 136918.34
> > > > SPECjbb2005 bops, which is a 6.3% regression.
> > >
> > > perftop during the run on num
On Tue, 20 Nov 2012, Ingo Molnar wrote:
> No doubt numa/core should not regress with THP off or on and
> I'll fix that.
>
> As a background, here's how SPECjbb gets slower on mainline
> (v3.7-rc6) if you boot Mel's kernel config and turn THP forcibly
> off:
>
> (avg: 502395 ops/sec)
> (avg
* Linus Torvalds wrote:
> On Mon, Nov 19, 2012 at 12:36 PM, Ingo Molnar wrote:
> >
> > Hugepages is a must for most forms of NUMA/HPC. This alone
> > questions the relevance of most of your prior numa/core testing
> > results. I now have to strongly dispute your other conclusions
> > as well.
>
On Tue, 20 Nov 2012, Ingo Molnar wrote:
> > > numa/core at ec05a2311c35 ("Merge branch 'sched/urgent' into
> > > sched/core") had an average throughput of 136918.34
> > > SPECjbb2005 bops, which is a 6.3% regression.
> >
> > perftop during the run on numa/core at 01aa90068b12 ("sched:
> > Use
* David Rientjes wrote:
> > numa/core at ec05a2311c35 ("Merge branch 'sched/urgent' into
> > sched/core") had an average throughput of 136918.34
> > SPECjbb2005 bops, which is a 6.3% regression.
>
> perftop during the run on numa/core at 01aa90068b12 ("sched:
> Use the best-buddy 'ideal cpu'
On Mon, 19 Nov 2012, David Rientjes wrote:
> I confirm that SPECjbb2005 1.07 -Xmx4g regresses in terms of throughput on
> my 16-way, 4 node system with 32GB of memory using 16 warehouses and 240
> measurement seconds. I averaged the throughput for five runs on each
> kernel.
>
> Java(TM) SE R
On Mon, Nov 19, 2012 at 12:36 PM, Ingo Molnar wrote:
>
> Hugepages is a must for most forms of NUMA/HPC. This alone
> questions the relevance of most of your prior numa/core testing
> results. I now have to strongly dispute your other conclusions
> as well.
Ingo, stop doing this kind of crap.
Le
On Mon, 19 Nov 2012, Mel Gorman wrote:
> I was not able to run a full sets of tests today as I was distracted so
> all I have is a multi JVM comparison. I'll keep it shorter than average
>
> 3.7.0 3.7.0
> rc5-stats-v4r2 rc5-schednuma-v1
On 11/19/2012 06:00 PM, Mel Gorman wrote:
On Mon, Nov 19, 2012 at 11:36:04PM +0100, Ingo Molnar wrote:
* Mel Gorman wrote:
Ok.
In response to one of your later questions, I found that I had
in fact disabled THP without properly reporting it. [...]
Hugepages is a must for most forms of NUM
On Mon, Nov 19, 2012 at 11:36:04PM +0100, Ingo Molnar wrote:
>
> * Mel Gorman wrote:
>
> > Ok.
> >
> > In response to one of your later questions, I found that I had
> > in fact disabled THP without properly reporting it. [...]
>
> Hugepages is a must for most forms of NUMA/HPC.
Requiring hu
* Mel Gorman wrote:
> Ok.
>
> In response to one of your later questions, I found that I had
> in fact disabled THP without properly reporting it. [...]
Hugepages is a must for most forms of NUMA/HPC. This alone
questions the relevance of most of your prior numa/core testing
results. I now
On Mon, Nov 19, 2012 at 09:07:07PM +0100, Ingo Molnar wrote:
>
> * Mel Gorman wrote:
>
> > > [ SPECjbb transactions/sec ]|
> > > [ higher is better ]|
> > > |
> > > SPECjbb single-1x32524k 507k| 638
On Mon, Nov 19, 2012 at 08:13:39PM +0100, Ingo Molnar wrote:
>
> * Mel Gorman wrote:
>
> > On Mon, Nov 19, 2012 at 03:14:17AM +0100, Ingo Molnar wrote:
> > > I'm pleased to announce the latest version of the numa/core tree.
> > >
> > > Here are some quick, preliminary performance numbers on a 4
* Mel Gorman wrote:
> > [ SPECjbb transactions/sec ]|
> > [ higher is better ]|
> > |
> > SPECjbb single-1x32524k 507k| 638k +21.7%
> > --
* Mel Gorman wrote:
> On Mon, Nov 19, 2012 at 03:14:17AM +0100, Ingo Molnar wrote:
> > I'm pleased to announce the latest version of the numa/core tree.
> >
> > Here are some quick, preliminary performance numbers on a 4-node,
> > 32-way, 64 GB RAM system:
> >
> > CONFIG_NUMA_BALANCING=y
> >
On Mon, Nov 19, 2012 at 03:14:17AM +0100, Ingo Molnar wrote:
> I'm pleased to announce the latest version of the numa/core tree.
>
> Here are some quick, preliminary performance numbers on a 4-node,
> 32-way, 64 GB RAM system:
>
> CONFIG_NUMA_BALANCING=y
>
39 matches
Mail list logo