On Wed, Apr 17, 2019 at 11:17:48AM +0200, Michal Hocko wrote:
On Tue 16-04-19 12:19:21, Yang Shi wrote:
On 4/16/19 12:47 AM, Michal Hocko wrote:
[...]
> Why cannot we simply demote in the proximity order? Why do you make
> cpuless nodes so special? If other close nodes are vacant then just
On Thu, Apr 18, 2019 at 11:02:27AM +0200, Michal Hocko wrote:
On Wed 17-04-19 13:43:44, Yang Shi wrote:
[...]
And, I'm wondering whether this optimization is also suitable to general
NUMA balancing or not.
If there are convincing numbers then this should be a preferable way to
deal with it.
On 18 Apr 2019, at 15:23, Yang Shi wrote:
> On 4/18/19 11:16 AM, Keith Busch wrote:
>> On Wed, Apr 17, 2019 at 10:13:44AM -0700, Dave Hansen wrote:
>>> On 4/17/19 2:23 AM, Michal Hocko wrote:
yes. This could be achieved by GFP_NOWAIT opportunistic allocation for
the migration target.
On 4/18/19 11:16 AM, Keith Busch wrote:
On Wed, Apr 17, 2019 at 10:13:44AM -0700, Dave Hansen wrote:
On 4/17/19 2:23 AM, Michal Hocko wrote:
yes. This could be achieved by GFP_NOWAIT opportunistic allocation for
the migration target. That should prevent from loops or artificial nodes
On Wed, Apr 17, 2019 at 10:13:44AM -0700, Dave Hansen wrote:
> On 4/17/19 2:23 AM, Michal Hocko wrote:
> > yes. This could be achieved by GFP_NOWAIT opportunistic allocation for
> > the migration target. That should prevent from loops or artificial nodes
> > exhausting quite naturaly AFAICS. Maybe
On 4/17/19 10:51 AM, Michal Hocko wrote:
On Wed 17-04-19 10:26:05, Yang Shi wrote:
On 4/17/19 9:39 AM, Michal Hocko wrote:
On Wed 17-04-19 09:37:39, Keith Busch wrote:
On Wed, Apr 17, 2019 at 05:39:23PM +0200, Michal Hocko wrote:
On Wed 17-04-19 09:23:46, Keith Busch wrote:
On Wed, Apr
On Wed 17-04-19 13:43:44, Yang Shi wrote:
[...]
> And, I'm wondering whether this optimization is also suitable to general
> NUMA balancing or not.
If there are convincing numbers then this should be a preferable way to
deal with it. Please note that the number of promotions is not the only
I would also not touch the numa balancing logic at this stage and
rather
see how the current implementation behaves.
I agree we would prefer start from something simpler and see how it
works.
The "twice access" optimization is aimed to reduce the PMEM
bandwidth burden
since the
On Wed 17-04-19 10:13:44, Dave Hansen wrote:
> On 4/17/19 2:23 AM, Michal Hocko wrote:
> >> 3. The demotion path can not have cycles
> > yes. This could be achieved by GFP_NOWAIT opportunistic allocation for
> > the migration target. That should prevent from loops or artificial nodes
> >
On Wed 17-04-19 10:26:05, Yang Shi wrote:
>
>
> On 4/17/19 9:39 AM, Michal Hocko wrote:
> > On Wed 17-04-19 09:37:39, Keith Busch wrote:
> > > On Wed, Apr 17, 2019 at 05:39:23PM +0200, Michal Hocko wrote:
> > > > On Wed 17-04-19 09:23:46, Keith Busch wrote:
> > > > > On Wed, Apr 17, 2019 at
On Wed, Apr 17, 2019 at 10:26:05AM -0700, Yang Shi wrote:
> On 4/17/19 9:39 AM, Michal Hocko wrote:
> > On Wed 17-04-19 09:37:39, Keith Busch wrote:
> > > On Wed, Apr 17, 2019 at 05:39:23PM +0200, Michal Hocko wrote:
> > > > On Wed 17-04-19 09:23:46, Keith Busch wrote:
> > > > > On Wed, Apr 17,
On 4/17/19 9:39 AM, Michal Hocko wrote:
On Wed 17-04-19 09:37:39, Keith Busch wrote:
On Wed, Apr 17, 2019 at 05:39:23PM +0200, Michal Hocko wrote:
On Wed 17-04-19 09:23:46, Keith Busch wrote:
On Wed, Apr 17, 2019 at 11:23:18AM +0200, Michal Hocko wrote:
On Tue 16-04-19 14:22:33, Dave
On 4/17/19 2:23 AM, Michal Hocko wrote:
>> 3. The demotion path can not have cycles
> yes. This could be achieved by GFP_NOWAIT opportunistic allocation for
> the migration target. That should prevent from loops or artificial nodes
> exhausting quite naturaly AFAICS. Maybe we will need some tricks
On Wed 17-04-19 09:37:39, Keith Busch wrote:
> On Wed, Apr 17, 2019 at 05:39:23PM +0200, Michal Hocko wrote:
> > On Wed 17-04-19 09:23:46, Keith Busch wrote:
> > > On Wed, Apr 17, 2019 at 11:23:18AM +0200, Michal Hocko wrote:
> > > > On Tue 16-04-19 14:22:33, Dave Hansen wrote:
> > > > > Keith
On Wed, Apr 17, 2019 at 05:39:23PM +0200, Michal Hocko wrote:
> On Wed 17-04-19 09:23:46, Keith Busch wrote:
> > On Wed, Apr 17, 2019 at 11:23:18AM +0200, Michal Hocko wrote:
> > > On Tue 16-04-19 14:22:33, Dave Hansen wrote:
> > > > Keith Busch had a set of patches to let you specify the demotion
On Wed 17-04-19 09:23:46, Keith Busch wrote:
> On Wed, Apr 17, 2019 at 11:23:18AM +0200, Michal Hocko wrote:
> > On Tue 16-04-19 14:22:33, Dave Hansen wrote:
> > > Keith Busch had a set of patches to let you specify the demotion order
> > > via sysfs for fun. The rules we came up with were:
> >
On Wed, Apr 17, 2019 at 11:23:18AM +0200, Michal Hocko wrote:
> On Tue 16-04-19 14:22:33, Dave Hansen wrote:
> > Keith Busch had a set of patches to let you specify the demotion order
> > via sysfs for fun. The rules we came up with were:
>
> I am not a fan of any sysfs "fun"
I'm hung up on the
On Tue, Apr 16, 2019 at 04:17:44PM -0700, Yang Shi wrote:
> On 4/16/19 4:04 PM, Dave Hansen wrote:
> > On 4/16/19 2:59 PM, Yang Shi wrote:
> > > On 4/16/19 2:22 PM, Dave Hansen wrote:
> > > > Keith Busch had a set of patches to let you specify the demotion order
> > > > via sysfs for fun. The
On Tue 16-04-19 14:22:33, Dave Hansen wrote:
> On 4/16/19 12:19 PM, Yang Shi wrote:
> > would we prefer to try all the nodes in the fallback order to find the
> > first less contended one (i.e. DRAM0 -> PMEM0 -> DRAM1 -> PMEM1 -> Swap)?
>
> Once a page went to DRAM1, how would we tell that it
On Tue 16-04-19 12:19:21, Yang Shi wrote:
>
>
> On 4/16/19 12:47 AM, Michal Hocko wrote:
[...]
> > Why cannot we simply demote in the proximity order? Why do you make
> > cpuless nodes so special? If other close nodes are vacant then just use
> > them.
>
> We could. But, this raises another
Why cannot we start simple and build from there? In other words I
do not
think we really need anything like N_CPU_MEM at all.
In this patchset N_CPU_MEM is used to tell us what nodes are cpuless
nodes.
They would be the preferred demotion target. Of course, we could
rely on
firmware to
On 4/16/19 4:04 PM, Dave Hansen wrote:
On 4/16/19 2:59 PM, Yang Shi wrote:
On 4/16/19 2:22 PM, Dave Hansen wrote:
Keith Busch had a set of patches to let you specify the demotion order
via sysfs for fun. The rules we came up with were:
1. Pages keep no history of where they have been
2.
On 4/16/19 2:59 PM, Yang Shi wrote:
> On 4/16/19 2:22 PM, Dave Hansen wrote:
>> Keith Busch had a set of patches to let you specify the demotion order
>> via sysfs for fun. The rules we came up with were:
>> 1. Pages keep no history of where they have been
>> 2. Each node can only demote to one
On 4/16/19 2:22 PM, Dave Hansen wrote:
On 4/16/19 12:19 PM, Yang Shi wrote:
would we prefer to try all the nodes in the fallback order to find the
first less contended one (i.e. DRAM0 -> PMEM0 -> DRAM1 -> PMEM1 -> Swap)?
Once a page went to DRAM1, how would we tell that it originated in
On 4/16/19 12:19 PM, Yang Shi wrote:
> would we prefer to try all the nodes in the fallback order to find the
> first less contended one (i.e. DRAM0 -> PMEM0 -> DRAM1 -> PMEM1 -> Swap)?
Once a page went to DRAM1, how would we tell that it originated in DRAM0
and is following the DRAM0 path rather
On 4/16/19 12:47 AM, Michal Hocko wrote:
On Mon 15-04-19 17:09:07, Yang Shi wrote:
On 4/12/19 1:47 AM, Michal Hocko wrote:
On Thu 11-04-19 11:56:50, Yang Shi wrote:
[...]
Design
==
Basically, the approach is aimed to spread data from DRAM (closest to local
CPU) down further to PMEM
On Tue 16-04-19 08:46:56, Dave Hansen wrote:
> On 4/16/19 7:39 AM, Michal Hocko wrote:
> >> Strict binding also doesn't keep another app from moving the
> >> memory.
> > I would consider that a bug.
>
> A bug where, though? Certainly not in the kernel.
Kernel should refrain from moving
On 16 Apr 2019, at 11:55, Dave Hansen wrote:
> On 4/16/19 8:33 AM, Zi Yan wrote:
>>> We have a reasonable argument that demotion is better than
>>> swapping. So, we could say that even if a VMA has a strict NUMA
>>> policy, demoting pages mapped there pages still beats swapping
>>> them or
On 4/16/19 8:33 AM, Zi Yan wrote:
>> We have a reasonable argument that demotion is better than
>> swapping. So, we could say that even if a VMA has a strict NUMA
>> policy, demoting pages mapped there pages still beats swapping
>> them or tossing the page cache. It's doing them a favor to
>>
On 4/16/19 7:39 AM, Michal Hocko wrote:
>> Strict binding also doesn't keep another app from moving the
>> memory.
> I would consider that a bug.
A bug where, though? Certainly not in the kernel.
I'm just saying that if an app has an assumption that strict binding
means that its memory can
On 16 Apr 2019, at 10:30, Dave Hansen wrote:
> On 4/16/19 12:47 AM, Michal Hocko wrote:
>> You definitely have to follow policy. You cannot demote to a node which
>> is outside of the cpuset/mempolicy because you are breaking contract
>> expected by the userspace. That implies doing a rmap walk.
On Tue 16-04-19 07:30:20, Dave Hansen wrote:
> On 4/16/19 12:47 AM, Michal Hocko wrote:
> > You definitely have to follow policy. You cannot demote to a node which
> > is outside of the cpuset/mempolicy because you are breaking contract
> > expected by the userspace. That implies doing a rmap
On 4/16/19 12:47 AM, Michal Hocko wrote:
> You definitely have to follow policy. You cannot demote to a node which
> is outside of the cpuset/mempolicy because you are breaking contract
> expected by the userspace. That implies doing a rmap walk.
What *is* the contract with userspace, anyway? :)
On Mon 15-04-19 17:09:07, Yang Shi wrote:
>
>
> On 4/12/19 1:47 AM, Michal Hocko wrote:
> > On Thu 11-04-19 11:56:50, Yang Shi wrote:
> > [...]
> > > Design
> > > ==
> > > Basically, the approach is aimed to spread data from DRAM (closest to
> > > local
> > > CPU) down further to PMEM and
On 4/12/19 1:47 AM, Michal Hocko wrote:
On Thu 11-04-19 11:56:50, Yang Shi wrote:
[...]
Design
==
Basically, the approach is aimed to spread data from DRAM (closest to local
CPU) down further to PMEM and disk (typically assume the lower tier storage
is slower, larger and cheaper than the
On Thu 11-04-19 11:56:50, Yang Shi wrote:
[...]
> Design
> ==
> Basically, the approach is aimed to spread data from DRAM (closest to local
> CPU) down further to PMEM and disk (typically assume the lower tier storage
> is slower, larger and cheaper than the upper tier) by their hotness. The
This isn't so much another aproach, as it it some tweaks on top of
what's there. Right?
This set seems to present a bunch of ideas, like "promote if accessed
twice". Seems like a good idea, but I'm a lot more interested in seeing
data about it being a good idea. What workloads is it good for?
With Dave Hansen's patches merged into Linus's tree
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c221c0b0308fd01d9fb33a16f64d2fd95f8830a4
PMEM could be hot plugged as NUMA node now. But, how to use PMEM as NUMA node
effectively and efficiently is still a
38 matches
Mail list logo