Signed-off-by: Tyler Hicks
This solves the problem that I had in this thread:
https://lore.kernel.org/lkml/ca+ck2bcd13jblmxn2mauryvqgkbs5ic2uqyssxxtccszxcm...@mail.gmail.com/
Thank you Tyler for root causing and finding a proper fix.
Reviewed-by: Pavel Tatashin
> ---
> drivers/nvdimm/reg
s/kmem/unbind
Note: if unbind is done without offlining memory beforehand, it won't be
possible to do dax0.0 hotremove, and dax's memory is going to be part of
System RAM until reboot.
Signed-off-by: Pavel Tatashin
Reviewed-by: David Hildenbrand
Reviewed-by: Dan Williams
---
drivers/dax/dax-pri
When add_memory() function fails, the resource and the memory should be
freed.
Fixes: c221c0b0308f ("device-dax: "Hotplug" persistent memory for use like
normal RAM")
Signed-off-by: Pavel Tatashin
Reviewed-by: Dave Hansen
---
drivers/dax/kmem.c | 5 -
1 file changed
volatile memory). So to have pmem0 device either memmap kernel
parameter is used, or devices nodes in dtb are specified.
Pavel Tatashin (3):
device-dax: fix memory and resource leak if hotplug fails
mm/hotplug: make remove_memory() interface useable
device-dax: "Hotremove" pers
it is safe to call this function without panicking machine, and also
makes it symmetric to add_memory() which already returns an error.
Signed-off-by: Pavel Tatashin
Reviewed-by: David Hildenbrand
Acked-by: Michal Hocko
---
include/linux/memory_hotplug.h | 8 +++--
mm/memory_hotplug.c| 64
ed by apps to ramdisk to pmem device
7. Do kexec reboot or reboot through firmware if firmware does not
zero memory in pmem0 region (These machines have only regular
volatile memory). So to have pmem0 device either memmap kernel
parameter is used, or devices nodes in dtb are specifie
When add_memory() function fails, the resource and the memory should be
freed.
Fixes: c221c0b0308f ("device-dax: "Hotplug" persistent memory for use like
normal RAM")
Signed-off-by: Pavel Tatashin
Reviewed-by: Dave Hansen
---
drivers/dax/kmem.c | 5 -
1 file changed
s/kmem/unbind
Note: if unbind is done without offlining memory beforehand, it won't be
possible to do dax0.0 hotremove, and dax's memory is going to be part of
System RAM until reboot.
Signed-off-by: Pavel Tatashin
Reviewed-by: David Hildenbrand
---
drivers/dax/dax-private.h | 2 ++
drivers/dax/
it is safe to call this function without panicking machine, and also
makes it symmetric to add_memory() which already returns an error.
Signed-off-by: Pavel Tatashin
Reviewed-by: David Hildenbrand
---
include/linux/memory_hotplug.h | 8 +++--
mm/memory_hotplug.c| 64
Hi Dan,
Thank you very much for your review, my comments below:
On Mon, May 6, 2019 at 2:01 PM Dan Williams wrote:
>
> On Mon, May 6, 2019 at 10:57 AM Dave Hansen wrote:
> >
> > > -static inline void remove_memory(int nid, u64 start, u64 size) {}
> > > +static inline bool remove_memory(int
> Hi Pavel,
>
> I've still not been able to hit this in my testing, is it something you
> hit only after applying these patches? i.e. does plain v65 work?
Yes, plain v65 works, but with these patches I see this error.
I use buildroot to build initramfs with ndctl. Here is how ndctl.mk looks
On Fri, May 17, 2019 at 1:24 PM Pavel Tatashin
wrote:
>
> On Fri, May 17, 2019 at 1:22 PM Pavel Tatashin
> wrote:
> >
> > On Fri, May 17, 2019 at 10:38 AM Michal Hocko wrote:
> > >
> > > On Fri 17-05-19 10:20:38, Pavel Tatashin wrote:
> > >
On Fri, May 17, 2019 at 1:22 PM Pavel Tatashin
wrote:
>
> On Fri, May 17, 2019 at 10:38 AM Michal Hocko wrote:
> >
> > On Fri 17-05-19 10:20:38, Pavel Tatashin wrote:
> > > This panic is unrelated to circular lock issue that I reported in a
> > > separate thr
On Thu, May 16, 2019 at 6:40 PM Vishal Verma wrote:
>
> Changes in v3:
> - In daxctl_dev_get_mode(), remove the subsystem warning, detect dax-class
>and simply make it return devdax
Hi Vishal,
I am still getting the same error as before:
# ndctl create-namespace --mode devdax --map mem -e
This panic is unrelated to circular lock issue that I reported in a
separate thread, that also happens during memory hotremove.
xakep ~/x/linux$ git describe
v5.1-12317-ga6a4b66bd8f4
Config is attached, qemu script is following:
qemu-system-x86_64
>
> I would think that ACPI hotplug would have a similar problem, but it does
> this:
>
> acpi_unbind_memory_blocks(info);
> __remove_memory(nid, info->start_addr, info->length);
ACPI does have exactly the same problem, so this is not a bug for this
series, I will
> Hi Pavel,
>
> I am working on adding this sort of a workflow into a new daxctl command
> (daxctl-reconfigure-device)- this will allow changing the 'mode' of a
> dax device to kmem, online the resulting memory, and with your patches,
> also attempt to offline the memory, and change back to
egion0 to system-ram mode
> # daxctl reconfigure-device --mode=system-ram --region=0 all
> [
> {
> "chardev":"dax0.0",
> "size":16777216000,
> "numa_node":2,
> "mode":"system-
On Mon, May 6, 2019 at 1:57 PM Dave Hansen wrote:
>
> > -static inline void remove_memory(int nid, u64 start, u64 size) {}
> > +static inline bool remove_memory(int nid, u64 start, u64 size)
> > +{
> > + return -EBUSY;
> > +}
>
> This seems like an appropriate place for a WARN_ONCE(), if
On Mon, May 6, 2019 at 2:04 PM Dave Hansen wrote:
>
> On 5/6/19 11:01 AM, Dan Williams wrote:
> >>> +void __remove_memory(int nid, u64 start, u64 size)
> >>> {
> >>> +
> >>> + /*
> >>> + * trigger BUG() is some memory is not offlined prior to calling
> >>> this
> >>> + * function
On Sat, May 4, 2019, 3:26 PM Dan Williams wrote:
> On Thu, May 2, 2019 at 9:12 AM Pavel Tatashin
> wrote:
> >
> > On Wed, Apr 17, 2019 at 2:53 PM Dan Williams
> wrote:
> > >
> > > Prepare for hot{plug,remove} of sub-ranges of a section by tracking a
> > I'm ok with it being 16M for now unless it causes a problem in
> > practice, i.e. something like the minimum hardware mapping alignment
> > for physical memory being less than 16M.
>
> On second thought, arbitrary differences across architectures is a bit
> sad. The most common nvdimm
On 19-05-01 22:55:37, Dan Williams wrote:
> Prepare for hot{plug,remove} of sub-ranges of a section by tracking a
> section active bitmask, each bit representing 2MB (SECTION_SIZE (128M) /
> map_active bitmask length (64)). If it turns out that 2MB is too large
> of an active tracking granularity
On Fri, May 3, 2019 at 6:35 AM Robin Murphy wrote:
>
> On 03/05/2019 01:41, Dan Williams wrote:
> > On Thu, May 2, 2019 at 7:53 AM Pavel Tatashin
> > wrote:
> >>
> >> On Wed, Apr 17, 2019 at 2:52 PM Dan Williams
> >> wrote:
> >>>
&
Hi Dan,
How do you test these patches? Do you have any instructions?
I see for example that check_hotplug_memory_range() still enforces
memory_block_size_bytes() alignment.
Also, after removing check_hotplug_memory_range(), I tried to online
16M aligned DAX memory, and got the following panic:
On Thu, May 2, 2019 at 6:29 PM Verma, Vishal L wrote:
>
> On Thu, 2019-05-02 at 17:44 -0400, Pavel Tatashin wrote:
>
> > > In running with these patches, and testing the offlining part, I ran
> > > into the following lockdep below.
> > >
> > > This is
On Thu, May 2, 2019 at 4:50 PM Verma, Vishal L wrote:
>
> On Thu, 2019-05-02 at 14:43 -0400, Pavel Tatashin wrote:
> > The series of operations look like this:
> >
> > 1. After boot restore /dev/pmem0 to ramdisk to be consumed by apps.
> >and free ramdisk.
>
;
> This is simply plumbing, small cleanups, and some identifier renames. No
> intended functional changes.
>
> Cc: Michal Hocko
> Cc: Vlastimil Babka
> Cc: Logan Gunthorpe
> Signed-off-by: Dan Williams
Reviewed-by: Pavel Tatashin
mmemap_populate(). There should be no sub-section usage in
> current deployments. New warnings are added to clarify which memmap
> allocation paths are sub-section capable.
>
> Cc: Michal Hocko
> Cc: David Hildenbrand
> Cc: Logan Gunthorpe
> Signed-off-by: Dan Wil
> 2 files changed, 10 insertions(+), 8 deletions(-)
given removing all unused "*ms"
Reviewed-by: Pavel Tatashin
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
it is safe to call this function without panicking machine, and also
makes it symmetric to add_memory() which already returns an error.
Signed-off-by: Pavel Tatashin
---
include/linux/memory_hotplug.h | 8 +++--
mm/memory_hotplug.c| 61 ++
2 files changed, 46
s/kmem/unbind
Note: if unbind is done without offlining memory beforehand, it won't be
possible to do dax0.0 hotremove, and dax's memory is going to be part of
System RAM until reboot.
Signed-off-by: Pavel Tatashin
Reviewed-by: David Hildenbrand
---
drivers/dax/dax-private.h | 2 ++
drivers/dax/
When add_memory() function fails, the resource and the memory should be
freed.
Fixes: c221c0b0308f ("device-dax: "Hotplug" persistent memory for use like
normal RAM")
Signed-off-by: Pavel Tatashin
Reviewed-by: Dave Hansen
---
drivers/dax/kmem.c | 5 -
1 file changed
sed, or devices nodes in dtb are specified.
Pavel Tatashin (3):
device-dax: fix memory and resource leak if hotplug fails
mm/hotplug: make remove_memory() interface useable
device-dax: "Hotremove" persistent memory that is used like normal RAM
drivers/dax/dax-private.h
> >device-dax/kmem driver. So, operations should look like this:
> >
> >echo offline > echo offline > /sys/devices/system/memory/memoryN/state
>
> This looks wrong :)
>
Indeed, I will fix patch log in the next version.
Thank you,
Pasha
___
> Currently the kmem driver can be built as a module, and I don't see a
> need to drop that flexibility. What about wrapping these core
> routines:
>
> unlock_device_hotplug
> __remove_memory
> walk_memory_range
> lock_device_hotplug
>
> ...into a common exported (gpl) helper like:
On Wed, Apr 17, 2019 at 2:53 PM Dan Williams wrote:
>
> Prepare for hot{plug,remove} of sub-ranges of a section by tracking a
> section active bitmask, each bit representing 2MB (SECTION_SIZE (128M) /
> map_active bitmask length (64)). If it turns out that 2MB is too large
> of an active tracking
On Wed, Apr 17, 2019 at 2:52 PM Dan Williams wrote:
>
> Up-level the local section size and mask from kernel/memremap.c to
> global definitions. These will be used by the new sub-section hotplug
> support.
>
> Cc: Michal Hocko
> Cc: Vlastimil Babka
> Cc: Jérôme Glisse
> Cc: Logan Gunthorpe
>
>
> Memory unplug bits
>
> Reviewed-by: David Hildenbrand
>
Thank you David.
Pasha
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
On Thu, May 2, 2019 at 2:07 AM Dan Williams wrote:
>
> On Wed, May 1, 2019 at 4:25 PM Pavel Tatashin
> wrote:
> >
> > On 19-04-17 11:39:00, Dan Williams wrote:
> > > Towards enabling memory hotplug to track partial population of a
> > > secti
On 19-04-17 11:39:00, Dan Williams wrote:
> Towards enabling memory hotplug to track partial population of a
> section, introduce 'struct mem_section_usage'.
>
> A pointer to a 'struct mem_section_usage' instance replaces the existing
> pointer to a 'pageblock_flags' bitmap. Effectively it adds
When add_memory() function fails, the resource and the memory should be
freed.
Fixes: c221c0b0308f ("device-dax: "Hotplug" persistent memory for use like
normal RAM")
Signed-off-by: Pavel Tatashin
Reviewed-by: Dave Hansen
---
drivers/dax/kmem.c | 5 -
1 file changed
ware if firmware does not
zero memory in pmem0 region (These machines have only regular
volatile memory). So to have pmem0 device either memmap kernel
parameter is used, or devices nodes in dtb are specified.
Pavel Tatashin (2):
device-dax: fix memory and resource leak if hotpl
s/bus/dax/drivers/kmem/unbind
Note: if unbind is done without offlining memory beforehand, it won't be
possible to do dax0.0 hotremove, and dax's memory is going to be part of
System RAM until reboot.
Signed-off-by: Pavel Tatashin
---
drivers/dax/dax-private.h | 2 +
drivers/dax/kmem.c
On Thu, Apr 25, 2019 at 3:01 PM Dave Hansen wrote:
>
> Hi Pavel,
>
> Thanks for doing this! I knew we'd have to get to it eventually, but
> sounds like you needed it sooner rather than later.
Hi Dave,
Thank you for taking time reviewing this work, my comments below:
> >
> > +#ifdef
On Thu, Apr 25, 2019 at 2:32 PM Dave Hansen wrote:
>
> On 4/25/19 10:54 AM, Pavel Tatashin wrote:
> > rc = add_memory(numa_node, new_res->start, resource_size(new_res));
> > - if (rc)
> > + if (rc) {
> > + release_resource(new_res)
> > I gave *vague* memories of running out of bits in the page flags if we
> > changed this, but that was a while back. If that's no longer the case,
> > then I'm open to changing the value, but I really don't want to expose
> > it as a Kconfig option as proposed in this patch. People won't have a
When add_memory() function fails, the resource and the memory should be
freed.
Fixes: c221c0b0308f ("device-dax: "Hotplug" persistent memory for use like
normal RAM")
Signed-off-by: Pavel Tatashin
---
drivers/dax/kmem.c | 5 -
1 file changed, 4 insertions(+), 1 del
emory in pmem0 region (These machines have only regular
volatile memory). So to have pmem0 device either memmap kernel
parameter is used, or devices nodes in dtb are specified.
Pavel Tatashin (2):
device-dax: fix memory and resource leak if hotplug fails
device-dax: "Hotremove" pers
>
> Yes, also I think you can let go of the device_lock in
> check_memblocks_offline_cb, lock_device_hotplug() should take care of
> this (see Documentation/core-api/memory-hotplug.rst - "locking internals")
>
Hi David,
Thank you for your comments. I went through memory-hotplug.rst, and I
still
> > > +static int
> > > +offline_memblock_cb(struct memory_block *mem, void *arg)
> >
> > Function name suggests that you are actually trying to offline memory
> > here. Maybe check_memblocks_offline_cb(), just like we have in
> > mm/memory_hotplug.c.
Makes sense, I will rename to
I am also taking a look at this work now. I will review and test it in
the next couple of days.
Pasha
On Tue, Apr 23, 2019 at 9:17 AM Oscar Salvador wrote:
>
> On Wed, 2019-04-17 at 15:59 -0700, Dan Williams wrote:
> > On Wed, Apr 17, 2019 at 3:04 PM Andrew Morton > org> wrote:
> > >
> > > On
> This is yet another example of where we need to break down the section
> alignment requirement for arch_add_memory().
>
> https://lore.kernel.org/lkml/12633539.2015392.2477781120122237934.st...@dwillia2-desk3.amr.corp.intel.com/
Hi Dan,
Yes, that is exactly what I am trying to solve with
from original email
On Wed, Apr 24, 2019 at 3:48 PM Pavel Tatashin
wrote:
>
> On Wed, Apr 24, 2019 at 5:07 AM Anshuman Khandual
> wrote:
> >
> > On 04/24/2019 02:08 AM, Pavel Tatashin wrote:
> > > sparsemem section size determines the maximum size and alignment th
On Wed, Apr 24, 2019 at 5:07 AM Anshuman Khandual
wrote:
>
> On 04/24/2019 02:08 AM, Pavel Tatashin wrote:
> > sparsemem section size determines the maximum size and alignment that
> > is allowed to offline/online memory block. The bigger the size the less
> > the clutte
is
attached, because it is not 1G aligned.
Allow, better flexibility by making section size configurable.
Signed-off-by: Pavel Tatashin
---
arch/arm64/Kconfig | 10 ++
arch/arm64/include/asm/sparsemem.h | 2 +-
2 files changed, 11 insertions(+), 1 deletion(-)
diff --git
s/bus/dax/drivers/kmem/unbind
Note: if unbind is done without offlining memory beforehand, it won't be
possible to do dax0.0 hotremove, and dax's memory is going to be part of
System RAM until reboot.
Signed-off-by: Pavel Tatashin
---
drivers/dax/dax-private.h | 2 +
drivers/dax/kmem.c
ed by apps to ramdisk to pmem device
7. Do kexec reboot or reboot through firmware if firmware does not
zero memory in pmem0 region (These machines have only regular
volatile memory). So to have pmem0 device either memmap kernel
parameter is used, or devices nodes in dtb are specifie
When add_memory() function fails, the resource and the memory should be
freed.
Fixes: c221c0b0308f ("device-dax: "Hotplug" persistent memory for use like
normal RAM")
Signed-off-by: Pavel Tatashin
---
drivers/dax/kmem.c | 5 -
1 file changed, 4 insertions(+), 1 del
On Sat, Apr 20, 2019 at 5:02 PM Dan Williams wrote:
>
> On Sat, Apr 20, 2019 at 10:02 AM Pavel Tatashin
> wrote:
> >
> > > > Thank you for looking at this. Are you saying, that if drv.remove()
> > > > returns a failure it is simply ignored, and un
> > Thank you for looking at this. Are you saying, that if drv.remove()
> > returns a failure it is simply ignored, and unbind proceeds?
>
> Yeah, that's the problem. I've looked at making unbind able to fail,
> but that can lead to general bad behavior in device-drivers. I.e. why
> spend time
> Makes sense, but I have some questions about the details.
>
> >
> > Copy the state, and hotadd the persistent memory so machine still has all
> > 8G for runtime. Before reboot, hotremove device-dax 2G, copy the memory
> > that is needed to be preserved to pmem0 device, and reboot.
> >
> > The
> > +
> > + /* Walk and offline every singe memory_block of the dax region. */
> > + lock_device_hotplug();
> > + rc = walk_memory_range(start_pfn, end_pfn, dev,
> > offline_memblock_cb);
> > + unlock_device_hotplug();
> > + if (rc)
> > + return rc;
>
>
.
To hotremove persistent memory, the management software must unbind it
from device-dax/kmem driver:
echo dax0.0 > /sys/bus/dax/drivers/kmem/unbind
Signed-off-by: Pavel Tatashin
---
drivers/dax/dax-private.h | 2 +
drivers/dax/kmem.c|
drivers/kmem/unbind
5. Create raw pmem0 device
ndctl create-namespace --mode raw -e namespace0.0 -f
6. Copy the state to this device
7. Do kexec reboot, or reboot through firmware, is firmware does not
zero memory in pmem region.
Pavel Tatashin (2):
device-dax
On 18-11-14 16:07:42, Michal Hocko wrote:
> On Mon 05-11-18 13:19:25, Alexander Duyck wrote:
> > This patchset is essentially a refactor of the page initialization logic
> > that is meant to provide for better code reuse while providing a
> > significant improvement in deferred page initialization
agree, I like your approach. It is clean, simplifies, and
improves the performance. I have tested it on both ARM and x86, and
verified the performance improvements. So:
Tested-by: Pavel Tatashin
> requires that there be more cores available to use. So for example on
> some of the new AMD Zen stuf
= __next_pfn_valid_range(, (end_pfn)))
Can this be improved somehow? It took me a while to understand this
piece of code. i is actually end of block, and not an index by PFN, ({pfn = i -
count; 1;}) is
simply hard to parse. Why can't we make __next_pfn_valid_range() to
return both end and a start of a
stent memory initialization time on average drop from 23.49s to
> 19.12s per node.
>
> Signed-off-by: Alexander Duyck
Reviewed-by: Pavel Tatashin
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
L optimization, I do not think it worse it.
The rest looks very good, please do the above change.
Reviewed-by: Pavel Tatashin
>
> Signed-off-by: Alexander Duyck
> ---
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
On 18-11-09 16:46:02, Alexander Duyck wrote:
> On Fri, 2018-11-09 at 19:00 -0500, Pavel Tatashin wrote:
> > On 18-11-09 15:14:35, Alexander Duyck wrote:
> > > On Fri, 2018-11-09 at 16:15 -0500, Pavel Tatashin wrote:
> > > > On 18-11-05 13:19:25, Alexander Duyck
ULONG_MAX to indicate that there are no
more deferred pages in this node.
Overall, I like this patch, makes things a lot easier, assuming the
above is addressed:
Reviewed-by: Pavel Tatashin
Thank you,
Pasha
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
64_MAX.
> +
> + if (out_spfn)
> + *out_spfn = max(zone->zone_start_pfn, spfn);
> + if (out_epfn)
> + *out_epfn = min(zone_end_pfn(zone), epfn);
Don't we need to verify after adjustment th
> > > + unsigned long epfn = PFN_DOWN(epa);
> > > + unsigned long spfn = PFN_UP(spa);
> > > +
> > > + /*
> > > + * Verify the end is at least past the start of the zone and
> > > + * that we have at least one PFN to initialize.
> > > + */
> > > +
On 18-11-09 15:14:35, Alexander Duyck wrote:
> On Fri, 2018-11-09 at 16:15 -0500, Pavel Tatashin wrote:
> > On 18-11-05 13:19:25, Alexander Duyck wrote:
> > > This patchset is essentially a refactor of the page initialization logic
> > > that is meant to provide
On 18-11-05 13:19:25, Alexander Duyck wrote:
> This patchset is essentially a refactor of the page initialization logic
> that is meant to provide for better code reuse while providing a
> significant improvement in deferred page initialization performance.
>
> In my testing on an x86_64 system
> > Hi Dan,
> >
> > I am worried that this work adds another way to multi-thread struct
> > page initialization without re-use of already existing method. The
> > code is already a mess, and leads to bugs [1] because of the number of
> > different memory layouts, architecture specific quirks, and
77 matches
Mail list logo