n't we end up setting the buddy order to
> > something > MAX_ORDER -1 on that path?
>
> Agreed.
We would need to return the supersized block to the huge page pool and not
to the buddy allocator. There is a special callback in the compound page
sos that you can call an alternate free
On Thu, 1 Jun 2017, Hugh Dickins wrote:
> SLUB versus SLAB, cpu versus memory? Since someone has taken the
> trouble to write it with ctors in the past, I didn't feel on firm
> enough ground to recommend such a change. But it may be obvious
> to someone else that your suggestion would be better
On Thu, 1 Jun 2017, Hugh Dickins wrote:
> Thanks a lot for working that out. Makes sense, fully understood now,
> nothing to worry about (though makes one wonder whether it's efficient
> to use ctors on high-alignment caches; or whether an internal "zero-me"
> ctor would be useful).
Use kzalloc
On Thu, 1 Jun 2017, Hugh Dickins wrote:
> CONFIG_SLUB_DEBUG_ON=y. My SLAB|SLUB config options are
>
> CONFIG_SLUB_DEBUG=y
> # CONFIG_SLUB_MEMCG_SYSFS_ON is not set
> # CONFIG_SLAB is not set
> CONFIG_SLUB=y
> # CONFIG_SLAB_FREELIST_RANDOM is not set
> CONFIG_SLUB_CPU_PARTIAL=y
> CONFIG_SLABINFO=y
> > I am curious as to what is going on there. Do you have the output from
> > these failed allocations?
>
> I thought the relevant output was in my mail. I did skip the Mem-Info
> dump, since that just seemed noise in this case: we know memory can get
> fragmented. What more output are you loo
On Wed, 31 May 2017, Michael Ellerman wrote:
> > SLUB: Unable to allocate memory on node -1, gfp=0x14000c0(GFP_KERNEL)
> > cache: pgtable-2^12, object size: 32768, buffer size: 65536, default
> > order: 4, min order: 4
> > pgtable-2^12 debugging increased min order, use slub_debug=O to disabl
On Tue, 30 May 2017, Hugh Dickins wrote:
> I wanted to try removing CONFIG_SLUB_DEBUG, but didn't succeed in that:
> it seemed to be a hard requirement for something, but I didn't find what.
CONFIG_SLUB_DEBUG does not enable debugging. It only includes the code to
be able to enable it at runtime.
On Wed, 21 Sep 2016, Tejun Heo wrote:
> Hello, Nick.
>
> How have you been? :)
>
He is baack. Are we getting SL!B? ;-)
On Fri, 8 Jul 2016, Kees Cook wrote:
> Is check_valid_pointer() making sure the pointer is within the usable
> size? It seemed like it was checking that it was within the slub
> object (checks against s->size, wants it above base after moving
> pointer to include redzone, etc).
check_valid_pointe
On Fri, 8 Jul 2016, Michael Ellerman wrote:
> > I wonder if this code should be using size_from_object() instead of s->size?
>
> Hmm, not sure. Who's SLUB maintainer? :)
Me.
s->size is the size of the whole object including debugging info etc.
ksize() gives you the actual usable size of an objec
> > the patch. See the code in slub.c that is similar.
>
> Doh, somehow I convinced myself that there's #else and alloc_pages() is only
> used for !CONFIG_NUMA so it doesn't matter. Here's a fixed version.
Acked-by: Christoph Lameter
__
On Thu, 30 Jul 2015, Vlastimil Babka wrote:
> --- a/mm/slob.c
> +++ b/mm/slob.c
> void *page;
>
> -#ifdef CONFIG_NUMA
> - if (node != NUMA_NO_NODE)
> - page = alloc_pages_exact_node(node, gfp, order);
> - else
> -#endif
> - page = alloc_pages(gfp, order);
> +
Acked-by: Christoph Lameter
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev
On Thu, 30 Jul 2015, Vlastimil Babka wrote:
> numa_mem_id() is able to handle allocation from CPUs on memory-less nodes,
> so it's a more robust fallback than the currently used numa_node_id().
>
> Suggested-by: Christoph Lameter
> Signed-off-by: Vlastimil Babka
> A
On Wed, 22 Jul 2015, David Rientjes wrote:
> Eek, yeah, that does look bad. I'm not even sure the
>
> if (nid < 0)
> nid = numa_node_id();
>
> is correct; I think this should be comparing to NUMA_NO_NODE rather than
> all negative numbers, otherwise we silently ignore overflow
On Tue, 21 Jul 2015, Vlastimil Babka wrote:
> The function alloc_pages_exact_node() was introduced in 6484eb3e2a81 ("page
> allocator: do not check NUMA node ID when the caller knows the node is valid")
> as an optimized variant of alloc_pages_node(), that doesn't allow the node id
> to be -1. Unf
On Wed, 29 Oct 2014, Michael Ellerman wrote:
> > #define __ARCH_IRQ_STAT
> >
> > -#define local_softirq_pending()
> > __get_cpu_var(irq_stat).__softirq_pending
> > +#define local_softirq_pending()
> > __this_cpu_read(irq_stat.__softirq_pending)
> > +#define set_softirq_pending(x) __this_c
On Tue, 28 Oct 2014, Michael Ellerman wrote:
> I'm happy to put it in a topic branch for 3.19, or move the definition or
> whatever, your choice Christoph.
Get the patch merged please.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https:
Ping? We are planning to remove support for __get_cpu_var in the
3.19 merge period. I can move the definition for __get_cpu_var into the
powerpc per cpu definition instead if we cannot get this merged?
On Tue, 21 Oct 2014, Christoph Lameter wrote:
>
> This still has not been merged a
write(y, x);
6. Increment/Decrement etc of a per cpu variable
DEFINE_PER_CPU(int, y);
__get_cpu_var(y)++
Converts to
__this_cpu_inc(y)
Cc: Benjamin Herrenschmidt
CC: Paul Mackerras
Signed-off-by: Christoph Lameter
---
arch/powerpc/include/asm/hardirq.h | 4 +++
On Wed, 13 Aug 2014, Nishanth Aravamudan wrote:
> +++ b/include/linux/topology.h
> @@ -119,11 +119,20 @@ static inline int numa_node_id(void)
> * Use the accessor functions set_numa_mem(), numa_mem_id() and cpu_to_mem().
> */
> DECLARE_PER_CPU(int, _numa_mem_);
> +extern int _node_numa_mem_[M
Goobledieguy due to missing Mime header.
On Thu, 12 Jun 2014, David Laight wrote:
> RnJvbTogQW50b24gQmxhbmNoYXJkDQouLi4NCj4gZGlmZiAtLWdpdCBhL2FyY2gvcG93ZXJwYy9i
> b290L2luc3RhbGwuc2ggYi9hcmNoL3Bvd2VycGMvYm9vdC9pbnN0YWxsLnNoDQo+IGluZGV4IGI2
> YTI1NmIuLmUwOTZlNWEgMTAwNjQ0DQo+IC0tLSBhL2FyY2gvcG93ZXJ
Looking at arch/powerpc/include/asm/percpu.h I see that the per cpu offset
comes from a local_paca field and local_paca is in r13. That means that
for all percpu operations we first have to determine the address through a
memory access.
Would it be possible to put the paca at the beginning of the
This is under Ubuntu Utopic Unicorn on a Power 8 system while simply
trying to build with the Ubuntu standard kernel config. It could be that
these issues come about because we do not have an rc1 yet but I wanted to
give some early notice. Also this is a new arch to me so I may not be
aware of how
On Mon, 19 May 2014, Nishanth Aravamudan wrote:
> I'm seeing a panic at boot with this change on an LPAR which actually
> has no Node 0. Here's what I think is happening:
>
> start_kernel
> ...
> -> setup_per_cpu_areas
> -> pcpu_embed_first_chunk
> -> pcpu_fc_alloc
>
On Mon, 31 Mar 2014, Nishanth Aravamudan wrote:
> Yep. The node exists, it's just fully exhausted at boot (due to the
> presence of 16GB pages reserved at boot-time).
Well if you want us to support that then I guess you need to propose
patches to address this issue.
> I'd appreciate a bit more g
On Thu, 27 Mar 2014, Nishanth Aravamudan wrote:
> > That looks to be the correct way to handle things. Maybe mark the node as
> > offline or somehow not present so that the kernel ignores it.
>
> This is a SLUB condition:
>
> mm/slub.c::early_kmem_cache_node_alloc():
> ...
> page = new_sla
On Tue, 25 Mar 2014, Nishanth Aravamudan wrote:
> On power, very early, we find the 16G pages (gpages in the powerpc arch
> code) in the device-tree:
>
> early_setup ->
> early_init_mmu ->
> htab_initialize ->
> htab_init_page_sizes ->
>
On Tue, 25 Mar 2014, Nishanth Aravamudan wrote:
> On 25.03.2014 [11:17:57 -0500], Christoph Lameter wrote:
> > On Mon, 24 Mar 2014, Nishanth Aravamudan wrote:
> >
> > > Anyone have any ideas here?
> >
> > Dont do that? Check on boot to not allow exhausting a no
On Mon, 24 Mar 2014, Nishanth Aravamudan wrote:
> Anyone have any ideas here?
Dont do that? Check on boot to not allow exhausting a node with huge
pages?
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linu
On Tue, 11 Mar 2014, Nishanth Aravamudan wrote:
> I have a P7 system that has no node0, but a node0 shows up in numactl
> --hardware, which has no cpus and no memory (and no PCI devices):
Well as you see from the code there has been so far the assumption that
node 0 has memory. I have never run a
On Mon, 24 Feb 2014, Joonsoo Kim wrote:
> > It will not common get there because of the tracking. Instead a per cpu
> > object will be used.
> > > get_partial_node() always fails even if there are some partial slab on
> > > memoryless node's neareast node.
> >
> > Correct and that leads to a page
On Fri, 21 Feb 2014, Nishanth Aravamudan wrote:
> I added two calls to local_memory_node(), I *think* both are necessary,
> but am willing to be corrected.
>
> One is in map_cpu_to_node() and one is in start_secondary(). The
> start_secondary() path is fine, AFAICT, as we are up & running at that
On Wed, 19 Feb 2014, Nishanth Aravamudan wrote:
> We can call local_memory_node() before the zonelists are setup. In that
> case, first_zones_zonelist() will not set zone and the reference to
> zone->node will Oops. Catch this case, and, since we presumably running
> very early, just return that a
On Wed, 19 Feb 2014, David Rientjes wrote:
> On Tue, 18 Feb 2014, Christoph Lameter wrote:
>
> > Its an optimization to avoid calling the page allocator to figure out if
> > there is memory available on a particular node.
> Thus this patch breaks with memory hot-add for a
On Tue, 18 Feb 2014, Nishanth Aravamudan wrote:
> the performance impact of the underlying NUMA configuration. I guess we
> could special-case memoryless/cpuless configurations somewhat, but I
> don't think there's any reason to do that if we can make memoryless-node
> support work in-kernel?
Wel
On Tue, 18 Feb 2014, Nishanth Aravamudan wrote:
> We use the topology provided by the hypervisor, it does actually reflect
> where CPUs and memory are, and their corresponding performance/NUMA
> characteristics.
And so there are actually nodes without memory that have processors?
Can the hypervis
On Tue, 18 Feb 2014, Nishanth Aravamudan wrote:
>
> Well, on powerpc, with the hypervisor providing the resources and the
> topology, you can have cpuless and memoryless nodes. I'm not sure how
> "fake" the NUMA is -- as I think since the resources are virtualized to
> be one system, it's logicall
On Mon, 17 Feb 2014, Joonsoo Kim wrote:
> On Wed, Feb 12, 2014 at 10:51:37PM -0800, Nishanth Aravamudan wrote:
> > Hi Joonsoo,
> > Also, given that only ia64 and (hopefuly soon) ppc64 can set
> > CONFIG_HAVE_MEMORYLESS_NODES, does that mean x86_64 can't have
> > memoryless nodes present? Even with
On Mon, 17 Feb 2014, Joonsoo Kim wrote:
> On Wed, Feb 12, 2014 at 04:16:11PM -0600, Christoph Lameter wrote:
> > Here is another patch with some fixes. The additional logic is only
> > compiled in if CONFIG_HAVE_MEMORYLESS_NODES is set.
> >
> > Subject: sl
current available per cpu objects and if that is not available will
create a new slab using the page allocator to fallback from the
memoryless node to some other node.
Signed-off-by: Christoph Lameter
Index: linux/mm/slub.c
On Mon, 10 Feb 2014, Joonsoo Kim wrote:
> On Fri, Feb 07, 2014 at 12:51:07PM -0600, Christoph Lameter wrote:
> > Here is a draft of a patch to make this work with memoryless nodes.
> >
> > The first thing is that we modify node_match to also match if we hit an
> >
Here is a draft of a patch to make this work with memoryless nodes.
The first thing is that we modify node_match to also match if we hit an
empty node. In that case we simply take the current slab if its there.
If there is no current slab then a regular allocation occurs with the
memoryless node.
On Fri, 7 Feb 2014, Joonsoo Kim wrote:
> >
> > It seems like a better approach would be to do this when a node is brought
> > online and determine the fallback node based not on the zonelists as you
> > do here but rather on locality (such as through a SLIT if provided, see
> > node_distance()).
>
On Fri, 7 Feb 2014, Joonsoo Kim wrote:
> > This check wouild need to be something that checks for other contigencies
> > in the page allocator as well. A simple solution would be to actually run
> > a GFP_THIS_NODE alloc to see if you can grab a page from the proper node.
> > If that fails then fa
On Thu, 6 Feb 2014, Joonsoo Kim wrote:
> diff --git a/mm/slub.c b/mm/slub.c
> index cc1f995..c851f82 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -1700,6 +1700,14 @@ static void *get_partial(struct kmem_cache *s, gfp_t
> flags, int node,
> void *object;
> int searchnode = (node ==
On Thu, 6 Feb 2014, David Rientjes wrote:
> I think you'll need to send these to Andrew since he appears to be picking
> up slub patches these days.
I can start managing merges again if Pekka no longer has the time.
___
Linuxppc-dev mailing list
Linuxpp
On Wed, 5 Feb 2014, Nishanth Aravamudan wrote:
> > Right so if we are ignoring the node then the simplest thing to do is to
> > not deactivate the current cpu slab but to take an object from it.
>
> Ok, that's what Anton's patch does, I believe. Are you ok with that
> patch as it is?
No. Again hi
e no partial slab on that node.
>
> On that node, page allocation always fallback to numa_mem_id() first. So
> searching a partial slab on numa_node_id() in that case is proper solution
> for memoryless node case.
Acked-by: Christoph Lameter
On Tue, 4 Feb 2014, Nishanth Aravamudan wrote:
> > If the target node allocation fails (for whatever reason) then I would
> > recommend for simplicities sake to change the target node to
> > NUMA_NO_NODE and just take whatever is in the current cpu slab. A more
> > complex solution would be to loo
On Mon, 3 Feb 2014, Nishanth Aravamudan wrote:
> Yes, sorry for my lack of clarity. I meant Joonsoo's latest patch for
> the $SUBJECT issue.
Hmmm... I am not sure that this is a general solution. The fallback to
other nodes can not only occur because a node has no memory as his patch
assumes.
If
On Mon, 3 Feb 2014, Nishanth Aravamudan wrote:
> So what's the status of this patch? Christoph, do you think this is fine
> as it is?
Certainly enabling CONFIG_MEMORYLESS_NODES is the right thing to do and I
already acked the patch.
___
Linuxppc-dev ma
On Wed, 29 Jan 2014, Nishanth Aravamudan wrote:
> exactly what the caller intends.
>
> int searchnode = node;
> if (node == NUMA_NO_NODE)
> searchnode = numa_mem_id();
> if (!node_present_pages(node))
> searchnode = local_memory_node(node);
>
> The difference in semantics from the prev
On Tue, 28 Jan 2014, Nishanth Aravamudan wrote:
> Anton Blanchard found an issue with an LPAR that had no memory in Node
> 0. Christoph Lameter recommended, as one possible solution, to use
> numa_mem_id() for locality of the nearest memory node-wise. However,
> numa_mem_id() [a
On Tue, 28 Jan 2014, Nishanth Aravamudan wrote:
> This helps about the same as David's patch -- but I found the reason
> why! ppc64 doesn't set CONFIG_HAVE_MEMORYLESS_NODES :) Expect a patch
> shortly for that and one other case I found.
Oww...
___
Lin
On Fri, 24 Jan 2014, Nishanth Aravamudan wrote:
> What I find odd is that there are only 2 nodes on this system, node 0
> (empty) and node 1. So won't numa_mem_id() always be 1? And every page
> should be coming from node 1 (thus node_match() should always be true?)
Well yes that occurs if you sp
On Fri, 24 Jan 2014, Nishanth Aravamudan wrote:
> As to cpu_to_node() being passed to kmalloc_node(), I think an
> appropriate fix is to change that to cpu_to_mem()?
Yup.
> > Yeah, the default policy should be to fallback to local memory if the node
> > passed is memoryless.
>
> Thanks!
I would
On Fri, 24 Jan 2014, David Rientjes wrote:
> kmalloc_node(nid) and kmem_cache_alloc_node(nid) should fallback to nodes
> other than nid when memory can't be allocated, these functions only
> indicate a preference.
The nid passed indicated a preference unless __GFP_THIS_NODE is specified.
Then the
On Fri, 24 Jan 2014, Wanpeng Li wrote:
> >
> >diff --git a/mm/slub.c b/mm/slub.c
> >index 545a170..a1c6040 100644
> >--- a/mm/slub.c
> >+++ b/mm/slub.c
> >@@ -1700,6 +1700,9 @@ static void *get_partial(struct kmem_cache *s, gfp_t
> >flags, int node,
> > void *object;
> > int searchnode =
On Mon, 20 Jan 2014, Wanpeng Li wrote:
> >+ enum zone_type high_zoneidx = gfp_zone(flags);
> >
> >+ if (!node_present_pages(searchnode)) {
> >+ zonelist = node_zonelist(searchnode, flags);
> >+ for_each_zone_zonelist(zone, z, zonelist, high_zoneidx) {
> >+
On Tue, 6 Aug 2013, Wladislav Wiebe wrote:
> ok, just saw in slab/for-linus branch that those stuff is reverted again..
No that was only for the 3.11 merge by Linus. The 3.12 patches have not
been put into pekkas tree.
___
Linuxppc-dev mailing list
Linu
On Wed, 31 Jul 2013, Wladislav Wiebe wrote:
> Thanks for the point, do you plan to make kmalloc_large available for extern
> access in a separate mainline patch?
> Since kmalloc_large is statically defined in slub_def.h and when including it
> to seq_file.c
> we have a lot of conflicting types:
allocation. Use kmalloc_large().
This fixes the warning about large allocs but it will still cause
large contiguous allocs that could fail because of memory fragmentation.
Signed-off-by: Christoph Lameter
Index: linux/fs/seq_file.c
buffers for proc fs?
Signed-off-by: Christoph Lameter
Index: linux/fs/seq_file.c
===
--- linux.orig/fs/seq_file.c2013-07-10 14:03:15.367134544 -0500
+++ linux/fs/seq_file.c 2013-07-31 10:11:42.671736131 -0500
@@ -96,7 +96,7 @@ static
On Wed, 31 Jul 2013, Wladislav Wiebe wrote:
> on a PPC 32-Bit board with a Linux Kernel v3.10.0 I see trouble with
> kmalloc_slab.
> Basically at system startup, something request a size of 8388608 b,
> but KMALLOC_MAX_SIZE has 4194304 b in our case. It points a WARNING at:
> ..
> NIP [c0099fec]
On Thu, 27 Dec 2012, Tang Chen wrote:
> On 12/26/2012 11:30 AM, Kamezawa Hiroyuki wrote:
> >> @@ -41,6 +42,7 @@ struct firmware_map_entry {
> >>const char *type; /* type of the memory range */
> >>struct list_headlist; /* entry for the linked list */
On Mon, 9 Jul 2012, Yasuaki Ishimatsu wrote:
> Even if you apply these patches, you cannot remove the physical memory
> completely since these patches are still under development. I want you to
> cooperate to improve the physical memory hot-remove. So please review these
> patches and give your c
> I was pointed by Glauber to the slab common code patches. I need some
> more time to read the patches. Now I think the slab/slot changes in this
> v3 are not needed, and can be ignored.
That may take some kernel cycles. You have a current issue here that needs
to be fixed.
> > down_write(&
itself.
Signed-off-by: Christoph Lameter
---
mm/slub.c | 29 ++---
1 file changed, 14 insertions(+), 15 deletions(-)
Index: linux-2.6/mm/slub.c
===
--- linux-2.6.orig/mm/slub.c2012-06-11 08:49
alias
processing is done using the copy of the string and not
the string itself.
Signed-off-by: Christoph Lameter
---
mm/slub.c | 29 ++---
1 file changed, 14 insertions(+), 15 deletions(-)
Index: linux-2.6/mm/slub.c
On Mon, 25 Jun 2012, Li Zhong wrote:
> This patch tries to kfree the cache name of pgtables cache if SLUB is
> used, as SLUB duplicates the cache name, and the original one is leaked.
SLAB also does not free the name. Why would you have an #ifdef in there?
On Mon, 13 Jun 2011, Pekka Enberg wrote:
> > Hmmm.. The allocpercpu in alloc_kmem_cache_cpus should take care of the
> > alignment. Uhh.. I see that a patch that removes the #ifdef CMPXCHG_LOCAL
> > was not applied? Pekka?
>
> This patch?
>
> http://git.kernel.org/?p=linux/kernel/git/penberg/slab-
On Sun, 12 Jun 2011, Hugh Dickins wrote:
> 3.0-rc won't boot with SLUB on my PowerPC G5: kernel BUG at mm/slub.c:1950!
> Bisected to 1759415e630e "slub: Remove CONFIG_CMPXCHG_LOCAL ifdeffery".
>
> After giving myself a medal for finding the BUG on line 1950 of mm/slub.c
> (it's actually the
>
On Fri, 24 Sep 2010, Alan Cox wrote:
> Whether you add new syscalls or do the fd passing using flags and hide
> the ugly bits in glibc is another question.
Use device specific ioctls instead of syscalls?
___
Linuxppc-dev mailing list
Linuxppc-dev@list
On Thu, 23 Sep 2010, john stultz wrote:
> > > 3) Further, the PTP hardware counter can be simply set to a new offset
> > > to put it in line with the network time. This could cause trouble with
> > > timekeeping much like unsynced TSCs do.
> >
> > You can do the same for system time.
>
> Settimeof
On Thu, 23 Sep 2010, Christian Riesch wrote:
> > > It implies clock tuning in userspace for a potential sub microsecond
> > > accurate clock. The clock accuracy will be limited by user space
> > > latencies and noise. You wont be able to discipline the system clock
> > > accurately.
> >
> > Noise
On Thu, 23 Sep 2010, john stultz wrote:
> > The HPET or pit timesource are also quite slow these days. You only need
> > access periodically to essentially tune the TSC ratio.
>
> If we're using the TSC, then we're not using the PTP clock as you
> suggest. Further the HPET and PIT aren't used to s
On Thu, 23 Sep 2010, Alan Cox wrote:
> > Please do not introduce useless additional layers for clock sync. Load
> > these ptp clocks like the other regular clock modules and make them sync
> > system time like any other clock.
>
> I don't think you understand PTP. PTP has masters, a system can nee
On Thu, 23 Sep 2010, Richard Cochran wrote:
> +* Gianfar PTP clock nodes
> +
> +General Properties:
> +
> + - compatible Should be "fsl,etsec-ptp"
> + - reg Offset and length of the register set for the device
> + - interrupts There should be at least two interrupts. Some devices
>
On Thu, 23 Sep 2010, john stultz wrote:
> This was my initial gut reaction as well, but in the end, I agree with
> Richard that in the case of one or multiple PTP hardware clocks, we
> really can't abstract over the different time domains.
My (arguably still superficial) review of the source does
On Thu, 23 Sep 2010, Jacob Keller wrote:
> > There is a reason for not being able to shift posix clocks: The system has
> > one time base. The various clocks are contributing to maintaining that
> > sytem wide time.
> >
> > Adjusting clocks is absolutely essential for proper functioning of the PTP
On Thu, 23 Sep 2010, Richard Cochran wrote:
> Support for obtaining timestamps from a PHC already exists via the
> SO_TIMESTAMPING socket option, integrated in kernel version 2.6.30.
> This patch set completes the picture by allow user space programs to
> adjust the PHC and to control its
On Mon, 1 Mar 2010, Mel Gorman wrote:
> Christoph, how feasible would it be to allow parallel reclaimers in
> __zone_reclaim() that back off at a rate depending on the number of
> reclaimers?
Not too hard. Zone locking is there but there may be a lot of bouncing
cachelines if you run it concurren
On Tue, 23 Feb 2010, Anton Blanchard wrote:
> zone_reclaim_mode.txt
> Now we set zone_reclaim_mode = 1. On each iteration we continue to improve,
> but even after 10 runs of stream we have > 10% remote node memory usage.
The intend of zone reclaim was never to allocate all memory from on node.
Yo
On Fri, 19 Feb 2010, Balbir Singh wrote:
> >> zone_reclaim. The others back off and try the next zone in the zonelist
> >> instead. I'm not sure what the original intention was but most likely it
> >> was to prevent too many parallel reclaimers in the same zone potentially
> >> dumping out way mor
On Fri, 19 Feb 2010, Mel Gorman wrote:
> > > The patch below sets a smaller value for RECLAIM_DISTANCE and thus enables
> > > zone reclaim.
> >
>
> I've no problem with the patch anyway.
Nor do I.
> > - We seem to end up racing between zone_watermark_ok, zone_reclaim and
> > buffered_rmqueue.
On Thu, 15 Oct 2009, Gerald Schaefer wrote:
> > The pages allocated as __GFP_MOVABLE are used to store the list of pages
> > allocated by the balloon. They reference virtual addresses and it would
> > be fine for the kernel to migrate the physical pages for those, the
> > balloon would not notice
On Fri, 1 May 2009, Sam Ravnborg wrote:
> Are there any specific reason why we do not support read_mostly on all
> architectures?
Not that I know of.
> read_mostly is about grouping rarely written data together
> so what is needed is to introduce this section in the remaining
> archtectures.
>
>
Mel Gorman wrote:
> With Erics patch and libhugetlbfs, we can automatically back text/data[1],
> malloc[2] and stacks without source modification. Fairly soon, libhugetlbfs
> will also be able to override shmget() to add SHM_HUGETLB. That should cover
> a lot of the memory-intensive apps without s
On Tue, 4 Mar 2008, Pekka Enberg wrote:
> Looking at the code, it's triggerable in 2.6.24.3 at least. Why we don't have
> a report yet, probably because (1) the default allocator is SLUB which doesn't
> suffer from this and (2) you need a big honkin' NUMA box that causes fallback
> allocations to
I think this is the correct fix.
The NUMA fallback logic should be passing local_flags to kmem_get_pages()
and not simply the flags.
Maybe a stable candidate since we are now simply
passing on flags to the page allocator on the fallback path.
Signed-off-by: Christoph Lameter <[EMAIL PROTEC
On Tue, 4 Mar 2008, Pekka J Enberg wrote:
> On Tue, 4 Mar 2008, Christoph Lameter wrote:
> > Slab allocations should never be passed these flags since the slabs do
> > their own thing there.
> >
> > The following patch would clear these in slub:
>
> Here
On Tue, 4 Mar 2008, Pekka Enberg wrote:
> > > >> [c9edf5f0] [c00b56e4]
> > .__alloc_pages_internal+0xf8/0x470
> > > >> [c9edf6e0] [c00e0458] .kmem_getpages+0x8c/0x194
> > > >> [c9edf770] [c00e1050] .fallback_alloc+0x194/0x254
> > > >> [c
On Tue, 4 Mar 2008, Pekka Enberg wrote:
> > > I suspect the WARN_ON() is bogus although I really don't know that part
> > > of the code all too well. Mel?
> > >
> >
> > The warn-on is valid. A situation should not exist that allows both flags
> > to
> > be set. I suspect if remove-set_migra
On Wed, 23 Jan 2008, Nishanth Aravamudan wrote:
> Right, so it might have functioned before, but the correctness was
> wobbly at best... Certainly the memoryless patch series has tightened
> that up, but we missed these SLAB issues.
>
> I see that your patch fixed Olaf's machine, Pekka. Nice work
On Wed, 23 Jan 2008, Pekka Enberg wrote:
> I think Mel said that their configuration did work with 2.6.23
> although I also wonder how that's possible. AFAIK there has been some
> changes in the page allocator that might explain this. That is, if
> kmem_getpages() returned pages for memoryless nod
On Wed, 23 Jan 2008, Pekka J Enberg wrote:
> Fine. But, why are we hitting fallback_alloc() in the first place? It's
> definitely not because of missing ->nodelists as we do:
>
> cache_cache.nodelists[node] = &initkmem_list3[CACHE_CACHE];
>
> before attempting to set up kmalloc caches.
On Wed, 23 Jan 2008, Mel Gorman wrote:
> This patch adds the necessary checks to make sure a kmem_list3 exists for
> the preferred node used when growing the cache. If the preferred node has
> no nodelist then the currently running node is used instead. This
> problem only affects the SLAB allocat
On Wed, 23 Jan 2008, Pekka J Enberg wrote:
> Furthermore, don't let kmem_getpages() call alloc_pages_node() if nodeid
> passed
> to it is -1 as the latter will always translate that to numa_node_id() which
> might not have ->nodelist that caused the invocation of fallback_alloc() in
> the
> firs
On Wed, 23 Jan 2008, Pekka J Enberg wrote:
> I still think Christoph's kmem_getpages() patch is correct (to fix
> cache_grow() oops) but I overlooked the fact that none the callers of
> cache_alloc_node() deal with bootstrapping (with the exception of
> __cache_alloc_node() that even has a
1 - 100 of 120 matches
Mail list logo