Re: [PATCH] x86/mm: Fix leak of pmd ptlock

2021-01-05 Thread Dave Hansen
estigation into why we're suddenly seeing this now. I agree that ridding ourselves of open-coded free_page()'s is a good idea, but this patch itself needs to be around for stable anyway. So, Acked-by: Dave Hansen ___ Linux-nvdimm mailing list

Re: [PATCH V3 04/10] x86/pks: Preserve the PKRS MSR on context switch

2020-12-18 Thread Dave Hansen
On 12/18/20 11:42 AM, Ira Weiny wrote: > Another problem would be if the kmap and kunmap happened in different > contexts... :-/ I don't think that is done either but I don't know for > certain. It would be really nice to put together some surveillance patches to help become more certain about t

Re: [NEEDS-REVIEW] [PATCH V3 04/10] x86/pks: Preserve the PKRS MSR on context switch

2020-12-18 Thread Dave Hansen
On 12/17/20 8:10 PM, Ira Weiny wrote: > On Thu, Dec 17, 2020 at 12:41:50PM -0800, Dave Hansen wrote: >> On 11/6/20 3:29 PM, ira.we...@intel.com wrote: >>> void disable_TSC(void) >>> @@ -644,6 +668,8 @@ void __switch_to_xtra(struct task_struct *prev_p, >

Re: [PATCH V3 10/10] x86/pks: Add PKS test code

2020-12-17 Thread Dave Hansen
On 11/6/20 3:29 PM, ira.we...@intel.com wrote: > + /* Arm for context switch test */ > + write(fd, "1", 1); > + > + /* Context switch out... */ > + sleep(4); > + > + /* Check msr restored */ > + write(fd, "2", 1); These are al

Re: [NEEDS-REVIEW] [PATCH V3 04/10] x86/pks: Preserve the PKRS MSR on context switch

2020-12-17 Thread Dave Hansen
On 11/6/20 3:29 PM, ira.we...@intel.com wrote: > void disable_TSC(void) > @@ -644,6 +668,8 @@ void __switch_to_xtra(struct task_struct *prev_p, struct > task_struct *next_p) > > if ((tifp ^ tifn) & _TIF_SLD) > switch_to_sld(tifn); > + > + pks_sched_in(); > } Does the s

Re: [PATCH RFC V3 7/9] x86/entry: Preserve PKRS MSR across exceptions

2020-10-14 Thread Dave Hansen
On 10/14/20 8:46 PM, Ira Weiny wrote: > On Tue, Oct 13, 2020 at 11:52:32AM -0700, Dave Hansen wrote: >> On 10/9/20 12:42 PM, ira.we...@intel.com wrote: >>> @@ -341,6 +341,9 @@ noinstr void irqentry_enter(struct pt_regs *regs, >>> irqentry_state_t *state) >>>

Re: [PATCH RFC V3 9/9] x86/pks: Add PKS test code

2020-10-13 Thread Dave Hansen
printf("cannot open file\n"); > + return -1; > + } > + Will this return code make anybody mad? Should we have a nicer return code for when this is running on non-PKS hardware? I

Re: [PATCH RFC V3 8/9] x86/fault: Report the PKRS state on fault

2020-10-13 Thread Dave Hansen
> @@ -548,6 +549,11 @@ show_fault_oops(struct pt_regs *regs, unsigned long > error_code, unsigned long ad >(error_code & X86_PF_PK)? "protection keys violation" : > "permissions violation"); > > +#ifdef CONFIG_ARCH_HAS_SUPERVISOR_P

Re: [PATCH RFC V3 7/9] x86/entry: Preserve PKRS MSR across exceptions

2020-10-13 Thread Dave Hansen
On 10/9/20 12:42 PM, ira.we...@intel.com wrote: > @@ -341,6 +341,9 @@ noinstr void irqentry_enter(struct pt_regs *regs, > irqentry_state_t *state) > /* Use the combo lockdep/tracing function */ > trace_hardirqs_off(); > instrumentation_end(); > + > +done: > + irq_save_pkrs(st

Re: [PATCH RFC V3 5/9] x86/pks: Add PKS kernel API

2020-10-13 Thread Dave Hansen
> +static inline void pks_update_protection(int pkey, unsigned long protection) > +{ > + current->thread.saved_pkrs = update_pkey_val(current->thread.saved_pkrs, > + pkey, protection); > + preempt_disable(); > + write_pkrs(current->thread

Re: [PATCH RFC V3 4/9] x86/pks: Preserve the PKRS MSR on context switch

2020-10-13 Thread Dave Hansen
On 10/9/20 12:42 PM, ira.we...@intel.com wrote: > From: Ira Weiny > > The PKRS MSR is defined as a per-logical-processor register. This > isolates memory access by logical CPU. Unfortunately, the MSR is not > managed by XSAVE. Therefore, tasks must save/restore the MSR value on > context switc

Re: [PATCH RFC V3 3/9] x86/pks: Enable Protection Keys Supervisor (PKS)

2020-10-13 Thread Dave Hansen
On 10/9/20 12:42 PM, ira.we...@intel.com wrote: > +/* > + * PKS is independent of PKU and either or both may be supported on a CPU. > + * Configure PKS if the cpu supports the feature. > + */ Let's at least be consistent about CPU vs. cpu in a single comment. :) > +static void setup_pks(void) > +

Re: [PATCH RFC V3 2/9] x86/fpu: Refactor arch_set_user_pkey_access() for PKS support

2020-10-13 Thread Dave Hansen
On 10/9/20 12:42 PM, ira.we...@intel.com wrote: > +/* > + * Update the pk_reg value and return it. How about: Replace disable bits for @pkey with values from @flags. > + * Kernel users use the same flags as user space: > + * PKEY_DISABLE_ACCESS > + * PKEY_DISABLE_WRITE > + */ > +

Re: [PATCH RFC V3 1/9] x86/pkeys: Create pkeys_common.h

2020-10-13 Thread Dave Hansen
)) Now that this has moved away from its use-site, it's a bit less self-documenting. Let's add a comment: /* * Generate an Access-Disable mask for the given pkey. Several of these * can be OR'd together to generate pkey register values. */ Once that's in p

Re: [PATCH RFC PKS/PMEM 22/58] fs/f2fs: Utilize new kmap_thread()

2020-10-12 Thread Dave Hansen
On 10/12/20 9:19 AM, Eric Biggers wrote: > On Sun, Oct 11, 2020 at 11:56:35PM -0700, Ira Weiny wrote: >>> And I still don't really understand. After this patchset, there is still >>> code >>> nearly identical to the above (doing a temporary mapping just for a memcpy) >>> that >>> would still be

Re: [PATCH v6 5/6] mm: secretmem: use PMD-size pages to amortize direct map fragmentation

2020-09-29 Thread Dave Hansen
On 9/29/20 7:12 AM, Peter Zijlstra wrote: >> | 1G| 2M| 4K >>--+++- >> ssd, mitigations=on| 308.75 | 317.37 | 314.9 >> ssd, mitigations=off | 305.25 | 295.32 | 304.92 >> ram, mitigations=o

Re: [PATCH v5 0/5] mm: introduce memfd_secret system call to create "secret" memory areas

2020-09-16 Thread Dave Hansen
On 9/16/20 11:49 AM, Andy Lutomirski wrote: > I still have serious concerns with uncached mappings. I'm not saying > I can't be convinced, but I'm not currently convinced that we should > allow user code to create UC mappings on x86. There's another widely-used OS that has a "NOCACHE" flag to be

Re: [PATCH RFC V2 17/17] x86/entry: Preserve PKRS MSR across exceptions

2020-07-23 Thread Dave Hansen
On 7/23/20 10:08 AM, Andy Lutomirski wrote: > Suppose some kernel code (a syscall or kernel thread) changes PKRS > then takes a page fault. The page fault handler needs a fresh PKRS. > Then the page fault handler (say a VMA’s .fault handler) changes > PKRS. The we get an interrupt. The interrupt *

Re: [PATCH RFC V2 17/17] x86/entry: Preserve PKRS MSR across exceptions

2020-07-23 Thread Dave Hansen
On 7/23/20 9:18 AM, Fenghua Yu wrote: > The PKRS MSR has been preserved in thread_info during kernel entry. We > don't need to preserve it in another place (i.e. idtentry_state). I'm missing how the PKRS MSR gets preserved in thread_info. Could you explain the mechanism by which this happens and

Re: [PATCH RFC V2 02/17] x86/fpu: Refactor arch_set_user_pkey_access() for PKS support

2020-07-17 Thread Dave Hansen
On 7/17/20 1:54 AM, Peter Zijlstra wrote: > This is unbelievable junk... Ouch! This is from the original user pkeys implementation. > How about something like: > > u32 update_pkey_reg(u32 pk_reg, int pkey, unsigned int flags) > { > int pkey_shift = pkey * PKR_BITS_PER_PKEY; > > pk_

Re: [RFC PATCH 12/15] kmap: Add stray write protection for device pages

2020-07-14 Thread Dave Hansen
On 7/14/20 12:29 PM, Peter Zijlstra wrote: > On Tue, Jul 14, 2020 at 12:06:16PM -0700, Ira Weiny wrote: >> On Tue, Jul 14, 2020 at 10:44:51AM +0200, Peter Zijlstra wrote: >>> So, if I followed along correctly, you're proposing to do a WRMSR per >>> k{,un}map{_atomic}(), sounds like excellent perfor

Re: [RFC PATCH 04/15] x86/pks: Preserve the PKRS MSR on context switch

2020-07-14 Thread Dave Hansen
On 7/14/20 11:53 AM, Ira Weiny wrote: >>> The PKRS MSR is defined as a per-core register. Just to be clear, PKRS is a per-logical-processor register, just like PKRU. The "per-core" thing here is a typo. ___ Linux-nvdimm mailing list -- linux-nvdimm@list

Re: [PATCH v2 0/2] Replace and improve "mcsafe" with copy_safe()

2020-05-01 Thread Dave Hansen
On 5/1/20 11:28 AM, Linus Torvalds wrote: > Plus on x86 you can't reasonably even have different code sequences > for that case, because CLAC/STAC don't have a "enable users read > accesses" vs "write accesses" case. It's an all-or-nothing "enable > user faults". > > We _used_ to have a difference

Re: [PATCH v2 2/3] mm/memory_hotplug: Introduce MHP_NO_FIRMWARE_MEMMAP

2020-04-30 Thread Dave Hansen
On 4/30/20 8:52 AM, David Hildenbrand wrote: >> Justifying behavior by documentation that does not consider memory >> hotplug is bad thinking. > Are you maybe confusing this patch series with the arm64 approach? This > is not about ordinary hotplugged DIMMs. > > I'd love to get Dan's, Dave's and M

Re: [PATCH v2 3/3] device-dax: Add system ram (add_memory()) with MHP_NO_FIRMWARE_MEMMAP

2020-04-30 Thread Dave Hansen
* MHP_NO_FIRMWARE_MEMMAP ensures that future * kexec'd kernels will not treat this as RAM. */ Not a biggie, though. Acked-by: Dave Hansen ___ Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org To unsubscribe send an email to linux-nvdimm-le...@lists.01.org

Re: [PATCH v3 03/10] efi: Enumerate EFI_MEMORY_SP

2019-06-07 Thread Dave Hansen
On 6/7/19 1:03 PM, Dan Williams wrote: >> Separate from these patches, should we have a runtime file that dumps >> out the same info? dmesg isn't always available, and hotplug could >> change this too, I'd imagine. > Perhaps, but I thought /proc/iomem was that runtime file. Given that > x86/Linux

Re: [PATCH v3 00/10] EFI Specific Purpose Memory Support

2019-06-07 Thread Dave Hansen
mory when we ask. If we added to the core-mm, we'd almost certainly not be able to get it back reliably. Anyway, thanks for doing these, and I really hope that the world's BIOSes actually use this flag. For the series: Reviewed-by: Dave Hansen ___

Re: [PATCH v3 08/10] device-dax: Add a driver for "hmem" devices

2019-06-07 Thread Dave Hansen
On 6/7/19 12:27 PM, Dan Williams wrote: > This consumes "hmem" devices the producer of "hmem" devices is saved for > a follow-on patch so that it can reference the new CONFIG_DEV_DAX_HMEM > symbol to gate performing the enumeration work. Do these literally show up as /dev/hmemX? __

Re: [PATCH v3 03/10] efi: Enumerate EFI_MEMORY_SP

2019-06-07 Thread Dave Hansen
On 6/7/19 12:27 PM, Dan Williams wrote: > @@ -848,15 +848,16 @@ char * __init efi_md_typeattr_format(char *buf, size_t > size, > if (attr & ~(EFI_MEMORY_UC | EFI_MEMORY_WC | EFI_MEMORY_WT | >EFI_MEMORY_WB | EFI_MEMORY_UCE | EFI_MEMORY_RO | >EFI_MEMORY_

Re: [ndctl PATCH 0/8] daxctl: add a new reconfigure-device command

2019-05-06 Thread Dave Hansen
This all looks quite nice to me. Thanks, Vishal! One minor nit: for those of us new to daxctl and friends, they can be a bit hard to get started with. Could you maybe add a few example invocations to the Documentation, or even this cover letter to help us newbies get started? ___

Re: [v5 2/3] mm/hotplug: make remove_memory() interface useable

2019-05-06 Thread Dave Hansen
On 5/6/19 11:01 AM, Dan Williams wrote: >>> +void __remove_memory(int nid, u64 start, u64 size) >>> { >>> + >>> + /* >>> + * trigger BUG() is some memory is not offlined prior to calling this >>> + * function >>> + */ >>> + if (try_remove_memory(nid, start, size)) >>> +

Re: [v5 2/3] mm/hotplug: make remove_memory() interface useable

2019-05-06 Thread Dave Hansen
> -static inline void remove_memory(int nid, u64 start, u64 size) {} > +static inline bool remove_memory(int nid, u64 start, u64 size) > +{ > + return -EBUSY; > +} This seems like an appropriate place for a WARN_ONCE(), if someone manages to call remove_memory() with hotplug disabled. BTW, I

Re: [v3 2/2] device-dax: "Hotremove" persistent memory that is used like normal RAM

2019-04-25 Thread Dave Hansen
Hi Pavel, Thanks for doing this! I knew we'd have to get to it eventually, but sounds like you needed it sooner rather than later. ... > static inline struct dev_dax *to_dev_dax(struct device *dev) > diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c > index 4c0131857133..6f1640462df9 100644

Re: [v3 1/2] device-dax: fix memory and resource leak if hotplug fails

2019-04-25 Thread Dave Hansen
good to me: Reviewed-by: Dave Hansen ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm

[PATCH 0/5] [v5] Allow persistent memory to be used like normal RAM

2019-02-25 Thread Dave Hansen
This is a relatively small delta from v4. The review comments seem to be settling down, so it seems like we should start thinking about how this might get merged. Are there any objections to taking it in via the nvdimm tree? Dan Williams, our intrepid nvdimm maintainer has said he would apprecia

[PATCH 2/5] mm/resource: move HMM pr_debug() deeper into resource code

2019-02-25 Thread Dave Hansen
From: Dave Hansen HMM consumes physical address space for its own use, even though nothing is mapped or accessible there. It uses a special resource description (IORES_DESC_DEVICE_PRIVATE_MEMORY) to uniquely identify these areas. When HMM consumes address space, it makes a best guess about

[PATCH 4/5] mm/resource: let walk_system_ram_range() search child resources

2019-02-25 Thread Dave Hansen
From: Dave Hansen In the process of onlining memory, we use walk_system_ram_range() to find the actual RAM areas inside of the area being onlined. However, it currently only finds memory resources which are "top-level" iomem_resources. Children are not currently searched which ca

[PATCH 1/5] mm/resource: return real error codes from walk failures

2019-02-25 Thread Dave Hansen
From: Dave Hansen walk_system_ram_range() can return an error code either becuase *it* failed, or because the 'func' that it calls returned an error. The memory hotplug does the following: ret = walk_system_ram_range(..., func); if (ret) return ret;

[PATCH 3/5] mm/memory-hotplug: allow memory resources to be children

2019-02-25 Thread Dave Hansen
From: Dave Hansen The mm/resource.c code is used to manage the physical address space. The current resource configuration can be viewed in /proc/iomem. An example of this is at the bottom of this description. The nvdimm subsystem "owns" the physical address resources which map to

[PATCH 5/5] dax: "Hotplug" persistent memory for use like normal RAM

2019-02-25 Thread Dave Hansen
From: Dave Hansen This is intended for use with NVDIMMs that are physically persistent (physically like flash) so that they can be used as a cost-effective RAM replacement. Intel Optane DC persistent memory is one implementation of this kind of NVDIMM. Currently, a persistent memory region

Re: question about page tables in DAX/FS/PMEM case

2019-02-21 Thread Dave Hansen
On 2/21/19 2:58 PM, Larry Bassel wrote: > AFAIK there is no hardware benefit from sharing the page table > directory within different page table. So the only benefit is the > amount of memory we save. The hardware benefit from schemes like this is that the CPU caches are better utilized. If two p

Re: [PATCH 5/5] dax: "Hotplug" persistent memory for use like normal RAM

2019-02-12 Thread Dave Hansen
On 2/9/19 3:00 AM, Brice Goglin wrote: > I've used your patches on fake hardware (memmap=xx!yy) with an older > nvdimm-pending branch (without Keith's patches). It worked fine. This > time I am running on real Intel hardware. Any idea where to look ? I've run them on real Intel hardware too. Coul

Re: [PATCH 0/5] [v4] Allow persistent memory to be used like normal RAM

2019-01-28 Thread Dave Hansen
On 1/28/19 3:09 AM, Balbir Singh wrote: >> This is intended for Intel-style NVDIMMs (aka. Intel Optane DC >> persistent memory) NVDIMMs. These DIMMs are physically persistent, >> more akin to flash than traditional RAM. They are also expected to >> be more cost-effective than using RAM, which is

Re: [PATCH 2/5] mm/resource: move HMM pr_debug() deeper into resource code

2019-01-25 Thread Dave Hansen
On 1/25/19 1:18 PM, Bjorn Helgaas wrote: > On Thu, Jan 24, 2019 at 5:21 PM Dave Hansen > wrote: >> diff -puN kernel/resource.c~move-request_region-check kernel/resource.c >> --- a/kernel/resource.c~move-request_region-check 2019-01-24 >> 15:13:14.453199539 -0800 &g

Re: [PATCH 1/5] mm/resource: return real error codes from walk failures

2019-01-25 Thread Dave Hansen
On 1/25/19 1:02 PM, Bjorn Helgaas wrote: >> @@ -453,7 +453,7 @@ int walk_system_ram_range(unsigned long >> unsigned long flags; >> struct resource res; >> unsigned long pfn, end_pfn; >> - int ret = -1; >> + int ret = -EINVAL; > Can you either make a similar chang

[PATCH 5/5] dax: "Hotplug" persistent memory for use like normal RAM

2019-01-24 Thread Dave Hansen
From: Dave Hansen This is intended for use with NVDIMMs that are physically persistent (physically like flash) so that they can be used as a cost-effective RAM replacement. Intel Optane DC persistent memory is one implementation of this kind of NVDIMM. Currently, a persistent memory region

[PATCH 3/5] mm/memory-hotplug: allow memory resources to be children

2019-01-24 Thread Dave Hansen
From: Dave Hansen The mm/resource.c code is used to manage the physical address space. The current resource configuration can be viewed in /proc/iomem. An example of this is at the bottom of this description. The nvdimm subsystem "owns" the physical address resources which map to

[PATCH 4/5] dax/kmem: let walk_system_ram_range() search child resources

2019-01-24 Thread Dave Hansen
From: Dave Hansen In the process of onlining memory, we use walk_system_ram_range() to find the actual RAM areas inside of the area being onlined. However, it currently only finds memory resources which are "top-level" iomem_resources. Children are not currently searched which ca

[PATCH 0/5] [v4] Allow persistent memory to be used like normal RAM

2019-01-24 Thread Dave Hansen
v3 spurred a bunch of really good discussion. Thanks to everybody that made comments and suggestions! I would still love some Acks on this from the folks on cc, even if it is on just the patch touching your area. Note: these are based on commit d2f33c19644 in: git://git.kernel.org/pub/s

[PATCH 2/5] mm/resource: move HMM pr_debug() deeper into resource code

2019-01-24 Thread Dave Hansen
From: Dave Hansen HMM consumes physical address space for its own use, even though nothing is mapped or accessible there. It uses a special resource description (IORES_DESC_DEVICE_PRIVATE_MEMORY) to uniquely identify these areas. When HMM consumes address space, it makes a best guess about

[PATCH 1/5] mm/resource: return real error codes from walk failures

2019-01-24 Thread Dave Hansen
From: Dave Hansen walk_system_ram_range() can return an error code either becuase *it* failed, or because the 'func' that it calls returned an error. The memory hotplug does the following: ret = walk_system_ram_range(..., func); if (ret) return ret;

Re: [PATCH 2/4] mm/memory-hotplug: allow memory resources to be children

2019-01-23 Thread Dave Hansen
On 1/16/19 3:38 PM, Jerome Glisse wrote: > So right now i would rather that we keep properly reporting this > hazard so that at least we know it failed because of that. This > also include making sure that we can not register private memory > as a child of an un-busy resource that does exist but mi

Re: [PATCH 2/4] mm/memory-hotplug: allow memory resources to be children

2019-01-18 Thread Dave Hansen
On 1/16/19 11:16 AM, Jerome Glisse wrote: >> We *could* also simply truncate the existing top-level >> "Persistent Memory" resource and take over the released address >> space. But, this means that if we ever decide to hot-unplug the >> "RAM" and give it back, we need to recreate the original setu

Re: [PATCH 4/4] dax: "Hotplug" persistent memory for use like normal RAM

2019-01-18 Thread Dave Hansen
On 1/17/19 11:47 PM, Yanmin Zhang wrote: > a chance for kernel to allocate PMEM as DMA buffer. > Some super speed devices like 10Giga NIC, USB (SSIC connecting modem), > might not work well if DMA buffer is in PMEM as it's slower than DRAM. > > Should your patchset consider it? No, I don't think

Re: [PATCH 0/4] Allow persistent memory to be used like normal RAM

2019-01-17 Thread Dave Hansen
On 1/17/19 8:29 AM, Jeff Moyer wrote: >> Persistent memory is cool. But, currently, you have to rewrite >> your applications to use it. Wouldn't it be cool if you could >> just have it show up in your system like normal RAM and get to >> it like a slow blob of memory? Well... have I got the patc

Re: [PATCH 4/4] dax: "Hotplug" persistent memory for use like normal RAM

2019-01-17 Thread Dave Hansen
On 1/17/19 12:19 AM, Yanmin Zhang wrote: >> > I didn't try pmem and I am wondering it's slower than DRAM. > Should a flag, such like _GFP_PMEM, be added to distinguish it from > DRAM? Absolutely not. :) We already have performance-differentiated memory, and lots of ways to enumerate and select it

Re: [PATCH 2/4] mm/memory-hotplug: allow memory resources to be children

2019-01-16 Thread Dave Hansen
On 1/16/19 11:16 AM, Jerome Glisse wrote: >> We also rework the old error message a bit since we do not get >> the conflicting entry back: only an indication that we *had* a >> conflict. > We should keep the device private check (moving it in __request_region) > as device private can try to registe

Re: [PATCH 4/4] dax: "Hotplug" persistent memory for use like normal RAM

2019-01-16 Thread Dave Hansen
On 1/16/19 1:16 PM, Bjorn Helgaas wrote: >> + /* >> +* Set flags appropriate for System RAM. Leave ..._BUSY clear >> +* so that add_memory() can add a child resource. >> +*/ >> + new_res->flags = IORESOURCE_SYSTEM_RAM; > IIUC, new_res->flags was set to "IORESOUR

Re: [PATCH 4/4] dax: "Hotplug" persistent memory for use like normal RAM

2019-01-16 Thread Dave Hansen
On 1/16/19 1:16 PM, Bjorn Helgaas wrote: > On Wed, Jan 16, 2019 at 12:25 PM Dave Hansen > wrote: >> From: Dave Hansen >> Currently, a persistent memory region is "owned" by a device driver, >> either the "Direct DAX" or "Filesystem DAX" drive

[PATCH 1/4] mm/resource: return real error codes from walk failures

2019-01-16 Thread Dave Hansen
From: Dave Hansen walk_system_ram_range() can return an error code either becuase *it* failed, or because the 'func' that it calls returned an error. The memory hotplug does the following: ret = walk_system_ram_range(..., func); if (ret) return ret;

[PATCH 4/4] dax: "Hotplug" persistent memory for use like normal RAM

2019-01-16 Thread Dave Hansen
From: Dave Hansen Currently, a persistent memory region is "owned" by a device driver, either the "Direct DAX" or "Filesystem DAX" drivers. These drivers allow applications to explicitly use persistent memory, generally by being modified to use special, new li

[PATCH 2/4] mm/memory-hotplug: allow memory resources to be children

2019-01-16 Thread Dave Hansen
From: Dave Hansen The mm/resource.c code is used to manage the physical address space. We can view the current resource configuration in /proc/iomem. An example of this is at the bottom of this description. The nvdimm subsystem "owns" the physical address resources which map to

[PATCH 3/4] dax/kmem: let walk_system_ram_range() search child resources

2019-01-16 Thread Dave Hansen
From: Dave Hansen In the process of onlining memory, we use walk_system_ram_range() to find the actual RAM areas inside of the area being onlined. However, it currently only finds memory resources which are "top-level" iomem_resources. Children are not currently searched which ca

[PATCH 0/4] Allow persistent memory to be used like normal RAM

2019-01-16 Thread Dave Hansen
I would like to get this queued up to get merged. Since most of the churn is in the nvdimm code, and it also depends on some refactoring that only exists in the nvdimm tree, it seems like putting it in *via* the nvdimm tree is the best path. But, this series makes non-trivial changes to the "reso

Re: [PATCH 0/9] Allow persistent memory to be used like normal RAM

2018-12-03 Thread Dave Hansen
On 12/3/18 1:22 AM, Brice Goglin wrote: > Le 22/10/2018 à 22:13, Dave Hansen a écrit : > What happens on systems without an HMAT? Does this new memory get merged > into existing NUMA nodes? It gets merged into the persistent memory device's node, as told by the firmware. Inte

Re: [PATCH 0/9] Allow persistent memory to be used like normal RAM

2018-10-26 Thread Dave Hansen
On 10/26/18 1:03 AM, Xishi Qiu wrote: > How about let the BIOS report a new type for kmem in e820 table? > e.g. > #define E820_PMEM 7 > #define E820_KMEM 8 It would be best if the BIOS just did this all for us. But, what you're describing would take years to get from concept to showing up

Re: [PATCH 0/9] Allow persistent memory to be used like normal RAM

2018-10-23 Thread Dave Hansen
>> This series adds a new "driver" to which pmem devices can be >> attached. Once attached, the memory "owned" by the device is >> hot-added to the kernel and managed like any other memory. On > > Would this memory be considered volatile (with the driver initializing > it to zeros), or persisten

[PATCH 9/9] dax/kmem: actually enable the code in Makefile

2018-10-22 Thread Dave Hansen
Most of the new code was dead up to this point. Now that all the pieces are in place, enable it. Cc: Dan Williams Cc: Dave Jiang Cc: Ross Zwisler Cc: Vishal Verma Cc: Tom Lendacky Cc: Andrew Morton Cc: Michal Hocko Cc: linux-nvdimm@lists.01.org Cc: linux-ker...@vger.kernel.org Cc: linux.

[PATCH 6/9] mm/memory-hotplug: allow memory resources to be children

2018-10-22 Thread Dave Hansen
The mm/resource.c code is used to manage the physical address space. We can view the current resource configuration in /proc/iomem. An example of this is at the bottom of this description. The nvdimm subsystem "owns" the physical address resources which map to persistent memory and has resourc

[PATCH 5/9] dax/kmem: add more nd dax kmem infrastructure

2018-10-22 Thread Dave Hansen
Each DAX mode has a set of wrappers and helpers. Add them for the kmem mode. Cc: Dan Williams Cc: Dave Jiang Cc: Ross Zwisler Cc: Vishal Verma Cc: Tom Lendacky Cc: Andrew Morton Cc: Michal Hocko Cc: linux-nvdimm@lists.01.org Cc: linux-ker...@vger.kernel.org Cc: linux...@kvack.org Cc: Hua

[PATCH 7/9] dax/kmem: actually perform memory hotplug

2018-10-22 Thread Dave Hansen
This is the meat of this whole series. When the "kmem" device's probe function is called and we know we have a good persistent memory device, hotplug the memory back into the main kernel. Cc: Dan Williams Cc: Dave Jiang Cc: Ross Zwisler Cc: Vishal Verma Cc: Tom Lendacky Cc: Andrew Morton

[PATCH 3/9] dax: add more kmem device infrastructure

2018-10-22 Thread Dave Hansen
The previous patch is a simple copy of the pmem driver. This makes it easy while this is in development to keep the pmem and kmem code in sync. This actually adds some necessary infrastructure for the new driver to compile. Cc: Dan Williams Cc: Dave Jiang Cc: Ross Zwisler Cc: Vishal Verma

[PATCH 4/9] dax/kmem: allow PMEM devices to bind to KMEM driver

2018-10-22 Thread Dave Hansen
Currently, a persistent memory device's mode must be coordinated with the driver to which it needs to bind. To change it from the fsdax to the device-dax driver, you first change the mode of the device itself. Instead of adding a new device mode, allow the PMEM mode to also bind to the KMEM dri

[PATCH 0/9] Allow persistent memory to be used like normal RAM

2018-10-22 Thread Dave Hansen
Persistent memory is cool. But, currently, you have to rewrite your applications to use it. Wouldn't it be cool if you could just have it show up in your system like normal RAM and get to it like a slow blob of memory? Well... have I got the patch series for you! This series adds a new "driver"

[PATCH 1/9] mm/resource: return real error codes from walk failures

2018-10-22 Thread Dave Hansen
walk_system_ram_range() can return an error code either becuase *it* failed, or because the 'func' that it calls returned an error. The memory hotplug does the following: ret = walk_system_ram_range(..., func); if (ret) return ret; and 'ret' makes it out to user

[PATCH 8/9] dax/kmem: let walk_system_ram_range() search child resources

2018-10-22 Thread Dave Hansen
In the process of onlining memory, we use walk_system_ram_range() to find the actual RAM areas inside of the area being onlined. However, it currently only finds memory resources which are "top-level" iomem_resources. Children are not currently searched which causes it to skip System RAM in are

[PATCH 2/9] dax: kernel memory driver for mm ownership of DAX

2018-10-22 Thread Dave Hansen
Add the actual driver to which will own the DAX range. This allows very nice party with the other possible "owners" of a DAX region: device DAX and filesystem DAX. It also greatly simplifies the process of handing off control of the memory between the different owners since it's just a matter o

Re: [PATCH v5 2/4] mm: Provide kernel parameter to allow disabling page init poisoning

2018-09-26 Thread Dave Hansen
On 09/26/2018 08:24 AM, Alexander Duyck wrote: > With no options it works just like slub_debug and enables all > available options. So in our case it is a NOP since we wanted the > debugging enabled by default. Yeah, but slub_debug is different. First, nobody uses the slub_debug=- option because

Re: [PATCH v5 2/4] mm: Provide kernel parameter to allow disabling page init poisoning

2018-09-26 Thread Dave Hansen
On 09/26/2018 12:38 AM, Michal Hocko wrote: > Why cannot you simply go with [no]vm_page_poison[=on/off]? I was trying to look to the future a bit, if we end up with five or six more other options we want to allow folks to enable/disable. I don't want to end up in a situation where we have a bunch

Re: [PATCH v5 2/4] mm: Provide kernel parameter to allow disabling page init poisoning

2018-09-25 Thread Dave Hansen
On 09/25/2018 01:38 PM, Alexander Duyck wrote: > On 9/25/2018 1:26 PM, Dave Hansen wrote: >> On 09/25/2018 01:20 PM, Alexander Duyck wrote: >>> +    vm_debug[=options]    [KNL] Available with CONFIG_DEBUG_VM=y. >>> +    May slow down system

Re: [PATCH v5 2/4] mm: Provide kernel parameter to allow disabling page init poisoning

2018-09-25 Thread Dave Hansen
On 09/25/2018 01:20 PM, Alexander Duyck wrote: > + vm_debug[=options] [KNL] Available with CONFIG_DEBUG_VM=y. > + May slow down system boot speed, especially when > + enabled on systems with a large amount of memory. > + All optio

Re: [PATCH 1/4] mm: Provide kernel parameter to allow disabling page init poisoning

2018-09-12 Thread Dave Hansen
On 09/12/2018 09:36 AM, Alexander Duyck wrote: >> vm_debug = [KNL] Available with CONFIG_DEBUG_VM=y. >> May slow down boot speed, especially on larger- >> memory systems when enabled. >> off: turn off all runtime V

Re: [PATCH 1/4] mm: Provide kernel parameter to allow disabling page init poisoning

2018-09-12 Thread Dave Hansen
On 09/12/2018 07:49 AM, Alexander Duyck wrote: >>> + page_init_poison= [KNL] Boot-time parameter changing the >>> + state of poisoning of page structures during early >>> + boot. Used to verify page metadata is not accessed >>> +

Re: [PATCH V4 4/4] kvm: add a check if pfn is from NVDIMM pmem.

2018-08-30 Thread Dave Hansen
On 08/22/2018 03:58 AM, Zhang Yi wrote: > bool kvm_is_reserved_pfn(kvm_pfn_t pfn) > { > - if (pfn_valid(pfn)) > - return PageReserved(pfn_to_page(pfn)); > + struct page *page; > + > + if (pfn_valid(pfn)) { > + page = pfn_to_page(pfn); > + return Pag

Re: [PATCH v2 00/14] mm: Asynchronous + multithreaded memmap init for ZONE_DEVICE

2018-07-23 Thread Dave Hansen
On 07/23/2018 04:09 AM, Michal Hocko wrote: > On Thu 19-07-18 11:41:10, Dave Hansen wrote: >> Are you looking for the actual end-user reports? This was more of a >> case of the customer plugging in some persistent memory DIMMs, noticing >> the boot delta and calling the fo

Re: [PATCH v2 00/14] mm: Asynchronous + multithreaded memmap init for ZONE_DEVICE

2018-07-19 Thread Dave Hansen
On 07/18/2018 05:05 AM, Michal Hocko wrote: > On Tue 17-07-18 10:32:32, Dan Williams wrote: >> On Tue, Jul 17, 2018 at 8:50 AM Michal Hocko wrote: > [...] >>> Is there any reason that this work has to target the next merge window? >>> The changelog is not really specific about that. >> >> Same rea

Re: [PATCH v3 0/3] create sysfs representation of ACPI HMAT

2017-12-22 Thread Dave Hansen
On 12/21/2017 07:09 PM, Anshuman Khandual wrote: > I had presented a proposal for NUMA redesign in the Plumbers Conference this > year where various memory devices with different kind of memory attributes > can be represented in the kernel and be used explicitly from the user space. > Here is the l

Re: [PATCH v3 0/3] create sysfs representation of ACPI HMAT

2017-12-20 Thread Dave Hansen
On 12/20/2017 10:19 AM, Matthew Wilcox wrote: > I don't know what the right interface is, but my laptop has a set of > /sys/devices/system/memory/memoryN/ directories. Perhaps this is the > right place to expose write_bw (etc). Those directories are already too redundant and wasteful. I think we

Re: [RFC v2 0/5] surface heterogeneous memory performance information

2017-07-19 Thread Dave Hansen
On 07/19/2017 02:48 AM, Bob Liu wrote: >> Option 2: Provide the user with HMAT performance data directly in >> sysfs, allowing applications to directly access it without the need >> for the library and daemon. >> > Is it possible to do the memory allocation automatically by the > kernel and transp

Re: [RFC v2 0/5] surface heterogeneous memory performance information

2017-07-07 Thread Dave Hansen
On 07/06/2017 11:27 PM, Balbir Singh wrote: > On Thu, 2017-07-06 at 15:52 -0600, Ross Zwisler wrote: >> # grep . mem_tgt2/* mem_tgt2/local_init/* 2>/dev/null >> mem_tgt2/firmware_id:1 This is here for folks that know their platform and know exactly the firmware ID (PXM in ACPI parlance) of a g

Re: [RFC v2 0/5] surface heterogeneous memory performance information

2017-07-06 Thread Dave Hansen
On 07/06/2017 04:08 PM, Jerome Glisse wrote: >> So, for applications that need to differentiate between memory ranges based >> on their performance, what option would work best for you? Is the local >> (initiator,target) performance provided by patch 5 enough, or do you >> require performance info

Re: [PATCH 1/2] mm: avoid spurious 'bad pmd' warning messages

2017-05-17 Thread Dave Hansen
On 05/17/2017 10:16 AM, Ross Zwisler wrote: > @@ -3061,7 +3061,7 @@ static int pte_alloc_one_map(struct vm_fault *vmf) >* through an atomic read in C, which is what pmd_trans_unstable() >* provides. >*/ > - if (pmd_trans_unstable(vmf->pmd) || pmd_devmap(*vmf->pmd)) > +

Re: [PATCH] mm,x86: fix SMP x86 32bit build for native_pud_clear()

2017-02-16 Thread Dave Hansen
On 02/15/2017 12:31 PM, Dave Jiang wrote: > The fix introduced by e4decc90 to fix the UP case for 32bit x86, however > that broke the SMP case that was working previously. Add ifdef so the dummy > function only show up for 32bit UP case only. Could you elaborate a bit on how it broke things? > Fi

Re: Enabling peer to peer device transactions for PCIe devices

2016-11-23 Thread Dave Hansen
On 11/22/2016 11:49 PM, Daniel Vetter wrote: > Yes, agreed. My idea with exposing vram sections using numa nodes wasn't > to reuse all the existing allocation policies directly, those won't work. > So at boot-up your default numa policy would exclude any vram nodes. > > But I think (as an -mm laym

Re: [PATCH] mm: add ZONE_DEVICE statistics to smaps

2016-11-15 Thread Dave Hansen
On 11/10/2016 02:11 PM, Dan Williams wrote: > @@ -774,6 +778,8 @@ static int show_smap(struct seq_file *m, void *v, int > is_pid) > "ShmemPmdMapped: %8lu kB\n" > "Shared_Hugetlb: %8lu kB\n" > "Private_Hugetlb: %7lu kB\n" > +"Device