Re: [RFC PATCH 1/5] mm, page_alloc: support multiple pages allocation

2013-07-10 Thread Dave Hansen
On 07/03/2013 01:34 AM, Joonsoo Kim wrote: - if (page) + do { + page = buffered_rmqueue(preferred_zone, zone, order, + gfp_mask, migratetype); + if (!page) +

Re: [RFC PATCH 1/5] mm, page_alloc: support multiple pages allocation

2013-07-10 Thread Dave Hansen
On 07/10/2013 06:02 PM, Joonsoo Kim wrote: On Wed, Jul 10, 2013 at 03:52:42PM -0700, Dave Hansen wrote: On 07/03/2013 01:34 AM, Joonsoo Kim wrote: - if (page) + do { + page = buffered_rmqueue(preferred_zone, zone, order

Re: [RFC PATCH 1/5] mm, page_alloc: support multiple pages allocation

2013-07-11 Thread Dave Hansen
On 07/10/2013 11:12 PM, Joonsoo Kim wrote: I'd also like to see some scalability numbers on this. How do your tests look when all the CPUs on the system are hammering away? What test do you mean? Please elaborate on this more Your existing tests looked single-threaded. That's certainly

Re: Yet more softlockups.

2013-07-12 Thread Dave Hansen
is a perf event. I've Cc:-ed Dave Hansen, the author of those changes - is this a false positive or some real problem? The warning comes from calling perf_sample_event_took(), which is only called from one place: perf_event_nmi_handler(). So we can be pretty sure that the perf NMI is firing

Re: Yet more softlockups.

2013-07-12 Thread Dave Hansen
On 07/12/2013 08:45 AM, Dave Jones wrote: On Fri, Jul 12, 2013 at 08:38:52AM -0700, Dave Hansen wrote: Dave, for your case, my suspicion would be that it got turned on inadvertently, or that we somehow have a bug which bumped up perf_event.c's 'active_events' and we're running some perf

Re: [PATCH] x86: perf: fix incorrect use of do_div() in nmi warning

2013-07-12 Thread Dave Hansen
On 07/12/2013 05:08 AM, Ingo Molnar wrote: Note, there was a second fix posted by Stephane Eranian for a separate patch which I also botched: http://lkml.kernel.org/r/20130704223010.GA30625@quad Both of these fixes need to get pulled in to Linus's tree and the 3.10 stable tree.

Re: [RFC PATCH 1/5] mm, page_alloc: support multiple pages allocation

2013-07-12 Thread Dave Hansen
On 07/10/2013 11:12 PM, Joonsoo Kim wrote: On Wed, Jul 10, 2013 at 10:38:20PM -0700, Dave Hansen wrote: You're probably right for small numbers of pages. But, if we're talking about things that are more than, say, 100 pages (isn't the pcp batch size clamped to 128 4k pages?) you surely don't

Re: Yet more softlockups.

2013-07-12 Thread Dave Hansen
On 07/12/2013 11:07 AM, David Ahern wrote: And Dave Hansen: I think nmi.c has the same do_div problem as kernel/events/core.c that Stephane fixed. Your patch has: whole_msecs = do_div(delta, (1000 * 1000)); decimal_msecs = do_div(delta, 1000) % 1000; Yup. There should

Re: Yet more softlockups.

2013-07-12 Thread Dave Hansen
I added the WARN_ONCE() the first time we enable a perf event: The watchdog code looks to use perf these days: [1.003260] [ cut here ] [1.007943] WARNING: at /home/davehans/linux.git/arch/x86/kernel/cpu/perf_event.c:471 x86_pmu_event_init+0x249/0x430() [

[PATCH] mm: vmstats: tlb flush counters

2013-07-16 Thread Dave Hansen
remote flushes or not. In the end, we really need to know if we actually _did_ global vs. local invalidations, so that leaves us with few options other than to muck with the counters from arch-specific code. Signed-off-by: Dave Hansen dave.han...@linux.intel.com -- To unsubscribe from this list

Re: [PATCH] mm: vmstats: tlb flush counters

2013-07-16 Thread Dave Hansen
On 07/16/2013 04:36 PM, Wanpeng Li wrote: On Tue, Jul 16, 2013 at 08:53:04AM -0700, Dave Hansen wrote: I was investigating some TLB flush scaling issues and realized that we do not have any good methods for figuring out how many TLB flushes we are doing. It would be nice to be able to do

[RESEND][PATCH] mm: vmstats: tlb flush counters

2013-07-16 Thread Dave Hansen
remote flushes or not. In the end, we really need to know if we actually _did_ global vs. local invalidations, so that leaves us with few options other than to muck with the counters from arch-specific code. Signed-off-by: Dave Hansen dave.han...@linux.intel.com --- linux.git-davehans/arch/x86

Re: [PATCH 5/8] thp, mm: locking tail page is a bug

2013-07-17 Thread Dave Hansen
On 07/17/2013 02:09 PM, Andrew Morton wrote: lock_page() is a pretty commonly called function, and I assume quite a lot of people run with CONFIG_DEBUG_VM=y. Is the overhead added by this patch really worthwhile? I always thought of it as a developer-only thing. I don't think any of the big

Re: [PATCH] mm/hotplug, x86: Disable ARCH_MEMORY_PROBE by default

2013-07-18 Thread Dave Hansen
On 07/17/2013 02:45 PM, Toshi Kani wrote: +CONFIG_ARCH_MEMORY_PROBE is supported on powerpc only. On x86, this config +option is disabled by default since ACPI notifies a memory hotplug event to +the kernel, which performs its hotplug operation as the result. Please +enable this option if you

Re: [PATCH] mm/hotplug, x86: Disable ARCH_MEMORY_PROBE by default

2013-07-18 Thread Dave Hansen
On 07/18/2013 09:26 AM, Toshi Kani wrote: On Thu, 2013-07-18 at 08:27 -0700, Dave Hansen wrote: I'd really prefer you don't do this. Do you really have random processes on your system poking at random sysfs files and then complaining when things break? I am afraid that the probe interface

Re: [PATCH] mm/hotplug, x86: Disable ARCH_MEMORY_PROBE by default

2013-07-18 Thread Dave Hansen
On 07/18/2013 01:10 PM, Toshi Kani wrote: On Thu, 2013-07-18 at 11:34 -0700, Dave Hansen wrote: I do not think so. Using echo command to write a value to /dev/sda is not how it is instructed to use in the document. I am not saying that we need to protect from a privileged user doing

[RFC][PATCH] mm: percpu pages: up batch size to fix arithmetic?? errror

2013-09-11 Thread Dave Hansen
I really don't know where the: batch /= 4; /* We effectively *= 4 below */ ... batch = rounddown_pow_of_two(batch + batch/2) - 1; came from. The round down code at *MOST* does a *= 1.5, but *averages* out to be just under 1. On a system with 128GB in a

Re: [RFC][PATCH] mm: percpu pages: up batch size to fix arithmetic?? errror

2013-09-11 Thread Dave Hansen
On 09/11/2013 04:08 PM, Cody P Schafer wrote: So we have this variable called batch, and the code is trying to store the _average_ number of pcp pages we want into it (not the batchsize), and then we divide our average goal by 4 to get a batchsize. All the comments refer to the size of the pcp

Re: [RFC][PATCH] mm: percpu pages: up batch size to fix arithmetic?? errror

2013-09-11 Thread Dave Hansen
BTW, in my little test, the median -count was 10, and the mean was 45. On 09/11/2013 04:21 PM, Cody P Schafer wrote: Also, we may want to consider shrinking pcp-high down from 6*pcp-batch given that the original 6* choice was based upon -batch actually being 1/4th of the average pageset size,

Re: [RFC][PATCH] mm: percpu pages: up batch size to fix arithmetic?? errror

2013-09-12 Thread Dave Hansen
On 09/12/2013 07:16 AM, Christoph Lameter wrote: On Wed, 11 Sep 2013, Dave Hansen wrote: 3. We want -high to approximate the size of the cache which is private to a given cpu. But, that's complicated by the L3 caches and hyperthreading today. well lets keep it well below

Re: [PATCH resend] drop_caches: add some documentation and info message

2013-07-30 Thread Dave Hansen
On 07/30/2013 05:55 AM, Michal Hocko wrote: If we add another flag in the future it can use bit 3? What if we get crazy and need more of them? I really hate using bits for these kinds of interfaces. I'm forgetful and never remember which bit is which, and they're possible to run out of. I'm

Re: [PATCH] mm/hotplug: remove unnecessary BUG_ON in __offline_pages()

2013-07-31 Thread Dave Hansen
thing that folks might run in to when adding new features or developing? It's in a cold path and the cost of the check is miniscule. The original author (cc'd) also saw a need to put this in probably because he actually ran in to this. In any case, it looks fairly safe to me: Reviewed-by: Dave

Re: [PATCH] drivers: base: new memory config sysfs driver for large memory systems

2013-08-01 Thread Dave Hansen
On 08/01/2013 01:57 PM, Greg Kroah-Hartman wrote: memory is the name used by the current sysfs memory layout code in drivers/base/memory.c. So it can't be the same unless we are going to create a toggle a boot time to select between the models, which is something I am looking to add if

Re: [PATCH 2/2] mm: thp: give transparent hugepage code a separate copy_page

2013-11-05 Thread Dave Hansen
On 10/28/2013 03:11 PM, Kirill A. Shutemov wrote: On Mon, Oct 28, 2013 at 03:16:20PM -0700, Dave Hansen wrote: void copy_huge_page(struct page *dst, struct page *src) { struct hstate *h = page_hstate(src); if (unlikely(pages_per_huge_page(h) MAX_ORDER_NR_PAGES

Re: [PATCH 2/2] mm: thp: give transparent hugepage code a separate copy_page

2013-11-06 Thread Dave Hansen
On 11/06/2013 05:46 AM, Hillf Danton wrote: On Tue, Oct 29, 2013 at 6:16 AM, Dave Hansen d...@sr71.net wrote: + +void copy_high_order_page(struct page *newpage, + struct page *oldpage, + int order) +{ + int i; + + might_sleep

Re: [PATCH v4 2/2] mm: allow to set overcommit ratio more precisely

2013-11-06 Thread Dave Hansen
On 11/06/2013 02:33 PM, Andrew Morton wrote: On Wed, 6 Nov 2013 03:42:20 -0500 (EST) Jerome Marchand jmarc...@redhat.com wrote: That was my first version of this patch (actually kbytes to avoid overflow). Dave raised the issue that it silently breaks the user interface: overcommit_ratio is

Re: [PATCH 1/9] mm: rename SPLIT_PTLOCKS to SPLIT_PTE_PTLOCKS

2013-09-13 Thread Dave Hansen
On 09/13/2013 06:06 AM, Kirill A. Shutemov wrote: --- a/mm/Kconfig +++ b/mm/Kconfig @@ -207,7 +207,7 @@ config PAGEFLAGS_EXTENDED # PA-RISC 7xxx's spinlock_t would enlarge struct page from 32 to 44 bytes. # DEBUG_SPINLOCK and DEBUG_LOCK_ALLOC spinlock_t also enlarge struct page. #

Re: [PATCH 8/9] mm: implement split page table lock for PMD level

2013-09-13 Thread Dave Hansen
On 09/13/2013 06:06 AM, Kirill A. Shutemov wrote: +config ARCH_ENABLE_SPLIT_PMD_PTLOCK + boolean + +config SPLIT_PMD_PTLOCK_CPUS + int + # hugetlb hasn't converted to split locking yet + default 99 if HUGETLB_PAGE + default 32 if ARCH_ENABLE_SPLIT_PMD_PTLOCK +

Re: [RFC][PATCH 1/8] mm: pcp: rename percpu pageset functions

2013-10-17 Thread Dave Hansen
On 10/16/2013 06:32 PM, David Rientjes wrote: +static void pageset_setup_from_batch_size(struct per_cpu_pageset *p, + unsigned long batch) { - pageset_update(p-pcp, 6 * batch, max(1UL, 1 * batch)); + unsigned long high; + high = 6 * batch; + if

[RFC][PATCH 2/8] mm: pcp: consolidate percpu_pagelist_fraction code

2013-10-15 Thread Dave Hansen
From: Dave Hansen dave.han...@linux.intel.com pageset_set_high_and_batch() and percpu_pagelist_fraction_sysctl_handler() both do the same calculation for establishing pcp-high: high = zone-managed_pages / percpu_pagelist_fraction; pageset_set_high_and_batch() also knows when it should

[RFC][PATCH 4/8] mm: pcp: move pageset sysctl code to sysctl.c

2013-10-15 Thread Dave Hansen
From: Dave Hansen dave.han...@linux.intel.com The percpu_pagelist_fraction_sysctl_handler() code is currently in page_alloc.c, probably because it uses some functions static to that file. Now that it is smaller and its interactions with the rest of the allocator code are confined

[RFC][PATCH 1/8] mm: pcp: rename percpu pageset functions

2013-10-15 Thread Dave Hansen
From: Dave Hansen dave.han...@linux.intel.com The per-cpu-pageset code has two distinct ways of being set up: 1. The boot-time code (the defaults that everybody runs with) calculates a batch size, then sets pcp-high to 6x that batch size. 2. The percpu_pagelist_fraction sysctl code

[RFC][PATCH 7/8] mm: pcp: move page coloring optimization away from pcp sizing

2013-10-15 Thread Dave Hansen
From: Dave Hansen dave.han...@linux.intel.com The percpu pages calculations are a bit convoluted. Right now, zone_batchsize() claims to be calculating the -batch size, but what actually happens is: 1. Calculate how large we want the entire pcp set to be (-high) 2. Scale that down by the ratio

[RFC][PATCH 8/8] mm: pcp: create setup_boot_pageset()

2013-10-15 Thread Dave Hansen
From: Dave Hansen dave.han...@linux.intel.com pageset_setup_from_batch_size() has one remaining call path: __build_all_zonelists() - setup_pageset() - pageset_setup_from_batch_size() And that one path is specialized. It is meant to essentially turn off the per-cpu

[RFC][PATCH 3/8] mm: pcp: separate pageset update code from sysctl code

2013-10-15 Thread Dave Hansen
From: Dave Hansen dave.han...@linux.intel.com This begins the work of moving the percpu pageset sysctl code out of page_alloc.c. update_all_zone_pageset_limits() is the now the only interface that the sysctl code *really* needs out of page_alloc.c. This helps make it very clear what

[RFC][PATCH 5/8] mm: pcp: make percpu_pagelist_fraction sysctl undoable

2013-10-15 Thread Dave Hansen
From: Dave Hansen dave.han...@linux.intel.com The kernel has two methods of setting the sizes of the percpu pagesets: 1. The default, according to a page_alloc.c comment is set to around 1000th of the size of the zone. But no more than 1/2 of a meg. 2. After boot

[RFC][PATCH 0/8] mm: freshen percpu pageset code

2013-10-15 Thread Dave Hansen
The percpu pageset (pcp) code is looking a little old and neglected these days. This set does a couple of these things (in order of importance, not order of implementation in the series): 1. Change the default pageset pcp-high value from 744kB to 512k. (see consolidate high-to-batch ratio

[RFC][PATCH 6/8] mm: pcp: consolidate high-to-batch ratio code

2013-10-15 Thread Dave Hansen
From: Dave Hansen dave.han...@linux.intel.com Up until now in this patch set, we really should not have been changing any behavior that users would notice. This patch potentially has performance implications for virtually all users since it changes the kernel's default behavior. The per-cpu

Re: [PATCH v6 07/11] VFS hot tracking: Add a /proc interface to control memory usage

2013-11-11 Thread Dave Hansen
On 11/06/2013 05:45 AM, Zhi Yong Wu wrote: Introduce a /proc interface hot-mem-high-thresh and to cap the memory which is consumed by hot_inode_item and hot_range_item, and they will be in the unit of 1M bytes. You don't seem to have any documentation for this, btw... :( +

Re: [PATCH v6 07/11] VFS hot tracking: Add a /proc interface to control memory usage

2013-11-12 Thread Dave Hansen
On 11/11/2013 02:45 PM, Zhi Yong Wu wrote: On Tue, Nov 12, 2013 at 6:15 AM, Dave Hansen dave.han...@intel.com wrote: In general, why do you have to control the number of these statically? It gives the user or admin one optional chance to control the amount of memory consumed by VFS hot

Re: [Results] [RFC PATCH v4 00/40] mm: Memory Power Management

2013-11-12 Thread Dave Hansen
On 11/12/2013 12:02 AM, Srivatsa S. Bhat wrote: I performed experiments on an IBM POWER 7 machine and got actual power-savings numbers (upto 2.6% of total system power) from this patchset. I presented them at the Kernel Summit but forgot to post them on LKML. So here they are: upto? What was

Re: [PATCH v6 07/11] VFS hot tracking: Add a /proc interface to control memory usage

2013-11-12 Thread Dave Hansen
On 11/12/2013 12:38 PM, Zhi Yong Wu wrote: On Wed, Nov 13, 2013 at 1:05 AM, Dave Hansen dave.han...@intel.com wrote: The on/off knob seems to me to be something better left to a mount option, not a global tunable. If it is left to a mount option, the user or admin can't change it *dynamically

[PATCH 0/2] v2: fix hugetlb vs. anon-thp copy page

2013-11-14 Thread Dave Hansen
There were only minor comments about this the last time around. Any reason not not merge it? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please

[PATCH 1/2] mm: hugetlbfs: Add some VM_BUG_ON()s to catch non-hugetlbfs pages

2013-11-14 Thread Dave Hansen
From: Dave Hansen dave.han...@linux.intel.com Dave Jiang reported that he was seeing oopses when running NUMA systems and default_hugepagesz=1G. I traced the issue down to migrate_page_copy() trying to use the same code for hugetlb pages and transparent hugepages. It should not have been

[PATCH 2/2] mm: thp: give transparent hugepage code a separate copy_page

2013-11-14 Thread Dave Hansen
Changes from v1: * removed explicit might_sleep() in favor of the one that we get from the cond_resched(); -- From: Dave Hansen dave.han...@linux.intel.com Right now, the migration code in migrate_page_copy() uses copy_huge_page() for hugetlbfs and thp pages: if (PageHuge(page

[v3][PATCH 1/2] mm: hugetlbfs: Add VM_BUG_ON()s to catch non-hugetlbfs pages

2013-11-15 Thread Dave Hansen
From: Dave Hansen dave.han...@linux.intel.com Changes from v2: * Removed the VM_BUG_ON() from copy_huge_page() since the next patch makes it able to handle non-hugetlbfs pages -- Dave Jiang reported that he was seeing oopses when running NUMA systems and default_hugepagesz=1G. I traced

[v3][PATCH 2/2] mm: thp: give transparent hugepage code a separate copy_page

2013-11-15 Thread Dave Hansen
Changes from v2: * Changes from v1: * removed explicit might_sleep() in favor of the one that we get from the cond_resched(); -- From: Dave Hansen dave.han...@linux.intel.com Right now, the migration code in migrate_page_copy() uses copy_huge_page() for hugetlbfs and thp pages

[v3][PATCH 0/2] v3: fix hugetlb vs. anon-thp copy page

2013-11-15 Thread Dave Hansen
This took some of Mel's comments in to consideration. Dave Jiang, could you retest this if you get a chance? These have only been lightly compile-tested. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo

Re: [PATCH] mm: call cond_resched() per MAX_ORDER_NR_PAGES pages copy

2013-11-18 Thread Dave Hansen
On 11/18/2013 10:54 AM, Naoya Horiguchi wrote: diff --git a/mm/migrate.c b/mm/migrate.c index cb5d152b58bc..661ff5f66591 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -454,7 +454,8 @@ static void __copy_gigantic_page(struct page *dst, struct page *src, struct page *src_base = src;

Re: [PATCH] mm: call cond_resched() per MAX_ORDER_NR_PAGES pages copy

2013-11-18 Thread Dave Hansen
On 11/18/2013 12:20 PM, Naoya Horiguchi wrote: Really, though, a lot of things seem to have MAX_ORDER set up so that it's at 256MB or 512MB. That's an awful lot to do between rescheds. Yes. BTW, I found that we have the same problem for other functions like copy_user_gigantic_page,

Re: [PATCH] mm: call cond_resched() per MAX_ORDER_NR_PAGES pages copy

2013-11-18 Thread Dave Hansen
On 11/18/2013 01:56 PM, Naoya Horiguchi wrote: Why bother trying to optimize it? I thought that if we call cond_resched() too often, the copying thread can take too long in a heavy load system, because the copying thread always yields the CPU in every loop. I think you're confusing

Re: [PATCH v5 2/3] x86, mpx: hook #BR exception handler to allocate bound tables

2014-02-24 Thread Dave Hansen
On 02/23/2014 05:27 AM, Qiaowei Ren wrote: +static bool allocate_bt(unsigned long bd_entry) +{ + unsigned long bt_size = 1UL (MPX_L2_BITS+MPX_L2_SHIFT); + unsigned long bt_addr, old_val = 0; + + bt_addr = sys_mmap_pgoff(0, bt_size, PROT_READ | PROT_WRITE, +

Re: [PATCH] ksm: Expose configuration via sysctl

2014-02-25 Thread Dave Hansen
On 02/24/2014 03:28 PM, Alexander Graf wrote: Configuration of tunables and Linux virtual memory settings has traditionally happened via sysctl. Thanks to that there are well established ways to make sysctl configuration bits persistent (sysctl.conf). KSM introduced a sysfs based

Re: [PATCH] ksm: Expose configuration via sysctl

2014-02-25 Thread Dave Hansen
On 02/25/2014 03:09 PM, Alexander Graf wrote: Couldn't we also (maybe in parallel) just teach the sysctl userspace about sysfs? This way we don't have to do parallel sysctls and sysfs for *EVERYTHING* in the kernel: sysfs.kernel.mm.transparent_hugepage.enabled=enabled It's pretty hard

Re: [PATCH v5 1/3] x86, mpx: add documentation on Intel MPX

2014-02-26 Thread Dave Hansen
On 02/23/2014 05:27 AM, Qiaowei Ren wrote: +Bounds Directory (BD) and Bounds Tables (BT) are stored in +application memory and are allocated by the application (in case +of kernel use, the structures will be in kernel memory). The +bound directory and each instance of bound table are in

Re: [PATCH v5 1/3] x86, mpx: add documentation on Intel MPX

2014-02-26 Thread Dave Hansen
On 02/26/2014 11:17 AM, Dave Hansen wrote: On 02/23/2014 05:27 AM, Qiaowei Ren wrote: +Bounds Directory (BD) and Bounds Tables (BT) are stored in +application memory and are allocated by the application (in case +of kernel use, the structures will be in kernel memory). The +bound directory

Re: [PATCH v5 1/3] x86, mpx: add documentation on Intel MPX

2014-02-26 Thread Dave Hansen
On 02/23/2014 05:27 AM, Qiaowei Ren wrote: +The other case that generates a #BR is when a BNDSTX instruction +attempts to save bounds to a BD entry marked as invalid. This is +an indication that no BT exists for this entry. In this case the +fault handler will allocate a new BT. Hi Qiaowei,

Re: [PATCHv3 1/2] mm: introduce vm_ops-map_pages()

2014-02-27 Thread Dave Hansen
On 02/27/2014 11:53 AM, Kirill A. Shutemov wrote: +#define FAULT_AROUND_ORDER 4 +#define FAULT_AROUND_PAGES (1UL FAULT_AROUND_ORDER) +#define FAULT_AROUND_MASK ~((1UL (PAGE_SHIFT + FAULT_AROUND_ORDER)) - 1) Looking at the performance data made me think of this: do we really want this to be

Re: [PATCHv3 1/2] mm: introduce vm_ops-map_pages()

2014-02-27 Thread Dave Hansen
On 02/27/2014 02:06 PM, Linus Torvalds wrote: On Thu, Feb 27, 2014 at 1:59 PM, Dave Hansen dave.han...@linux.intel.com wrote: Also, the folks with larger base bage sizes probably don't want a FAULT_AROUND_ORDER=4. That's 1MB of fault-around for ppc64, for example. Actually, I'd expect

Re: [PATCH RFC 0/1] ksm: check and skip page, if it is already scanned

2014-03-04 Thread Dave Hansen
On 03/03/2014 06:48 PM, Pradeep Sawlani wrote: Patch uses two bits to detect if page is scanned, one bit for odd cycle and other for even cycle. This adds one more bit in page flags and overloads existing bit (PG_owner_priv_1). Changes are based of 3.4.79 kernel, since I have used that for

ecryptfs log spew from EINTR

2014-03-05 Thread Dave Hansen
I have a little program that uses mmap() to copy files. Essentially: addr1 = mmap(fd1); addr2 = mmap(fd2); memcpy(addr1, addr2, len); If these files are on ecryptfs and I interrupt the memcpy() with ^C, I consistently get this in dmesg: ecryptfs_decrypt_page: Error

[PATCH 7/7] big time hack: instrument flush times

2014-03-05 Thread Dave Hansen
From: Dave Hansen dave.han...@linux.intel.com The tracepoint code is a _bit_ too much overhead, so use some percpu counters to aggregate it instead. Yes, this is racy and ugly beyond reason, but it was quick to code up. I'm posting this here because it's interesting to have around

[PATCH 6/7] x86: mm: set TLB flush tunable to sane value

2014-03-05 Thread Dave Hansen
From: Dave Hansen dave.han...@linux.intel.com Now that we have some shiny new tracepoints, we can actually figure out what the heck is going on. During a kernel compile, 60% of the flush_tlb_mm_range() calls are for a single page. It breaks down like this: size percent percent= V

[PATCH 2/7] x86: mm: rip out complicated, out-of-date, buggy TLB flushing

2014-03-05 Thread Dave Hansen
From: Dave Hansen dave.han...@linux.intel.com I think the flush_tlb_mm_range() code that tries to tune the flush sizes based on the CPU needs to get ripped out for several reasons: 1. It is obviously buggy. It uses mm-total_vm to judge the task's footprint in the TLB. It should certainly

[PATCH 5/7] x86: mm: new tunable for single vs full TLB flush

2014-03-05 Thread Dave Hansen
From: Dave Hansen dave.han...@linux.intel.com Most of the logic here is in the documentation file. Please take a look at it. I know we've come full-circle here back to a tunable, but this new one is *WAY* simpler. I challenge anyone to describe in one sentence how the old one worked. Here's

[PATCH 3/7] x86: mm: fix missed global TLB flush stat

2014-03-05 Thread Dave Hansen
From: Dave Hansen dave.han...@linux.intel.com If we take the if (end == TLB_FLUSH_ALL || vmflag VM_HUGETLB) { local_flush_tlb(); goto out; } path out of flush_tlb_mm_range(), we will have flushed the tlb, but not incremented

[PATCH 4/7] x86: mm: trace tlb flushes

2014-03-05 Thread Dave Hansen
From: Dave Hansen dave.han...@linux.intel.com We don't have any good way to figure out what kinds of flushes are being attempted. Right now, we can try to use the vm counters, but those only tell us what we actually did with the hardware (one-by-one vs full) and don't tell us what was actually

[PATCH 1/7] x86: mm: clean up tlb flushing code

2014-03-05 Thread Dave Hansen
From: Dave Hansen dave.han...@linux.intel.com The if (cpumask_any_but(mm_cpumask(mm), smp_processor_id()) nr_cpu_ids) line of code is not exactly the easiest to audit, especially when it ends up at two different indentation levels. This eliminates one of the the copy-n-paste versions

[PATCH 0/7] x86: rework tlb range flushing code

2014-03-05 Thread Dave Hansen
Reposting with an instrumentation patch, and a few minor tweaks. I'd love some more eyeballs on this, but I think it's ready for -mm. I'm having it run through the LKP harness to see if any perfmance regressions (or gains) show up. Without the last (instrumentation/debugging) patch:

Re: [PATCH 6/7] x86: mm: set TLB flush tunable to sane value

2014-03-07 Thread Dave Hansen
On 03/06/2014 05:55 PM, Davidlohr Bueso wrote: On Wed, 2014-03-05 at 16:45 -0800, Dave Hansen wrote: From: Dave Hansen dave.han...@linux.intel.com Now that we have some shiny new tracepoints, we can actually figure out what the heck is going on. During a kernel compile, 60

Re: [PATCH 5/7] x86: mm: new tunable for single vs full TLB flush

2014-03-07 Thread Dave Hansen
On 03/06/2014 05:37 PM, Davidlohr Bueso wrote: On Wed, 2014-03-05 at 16:45 -0800, Dave Hansen wrote: From: Dave Hansen dave.han...@linux.intel.com + +If you believe that invlpg is being called too often, you can +lower the tunable: + +/sys/debug/kernel/x86/tlb_single_page_flush_ceiling

[PATCH 3/7] x86: mm: fix missed global TLB flush stat

2014-03-10 Thread Dave Hansen
From: Dave Hansen dave.han...@linux.intel.com If we take the if (end == TLB_FLUSH_ALL || vmflag VM_HUGETLB) { local_flush_tlb(); goto out; } path out of flush_tlb_mm_range(), we will have flushed the tlb, but not incremented

[PATCH 6/7] x86: mm: set TLB flush tunable to sane value

2014-03-10 Thread Dave Hansen
From: Dave Hansen dave.han...@linux.intel.com Now that we have some shiny new tracepoints, we can actually figure out what the heck is going on. During a kernel compile, 60% of the flush_tlb_mm_range() calls are for a single page. It breaks down like this: size percent percent= V

[PATCH 1/7] x86: mm: clean up tlb flushing code

2014-03-10 Thread Dave Hansen
From: Dave Hansen dave.han...@linux.intel.com The if (cpumask_any_but(mm_cpumask(mm), smp_processor_id()) nr_cpu_ids) line of code is not exactly the easiest to audit, especially when it ends up at two different indentation levels. This eliminates one of the the copy-n-paste versions

[PATCH 0/7] x86: rework tlb range flushing code

2014-03-10 Thread Dave Hansen
Changes from v2: * Added a brief comment above the ceiling tunable * Updated the documentation to mention large pages and say individual flush instead of invlpg in most cases. Reposting with an instrumentation patch, and a few minor tweaks. I'd love some more eyeballs on this, but I think

[PATCH 7/7] big time hack: instrument flush times

2014-03-10 Thread Dave Hansen
From: Dave Hansen dave.han...@linux.intel.com The tracepoint code is a _bit_ too much overhead, so use some percpu counters to aggregate it instead. Yes, this is racy and ugly beyond reason, but it was quick to code up. I'm posting this here because it's interesting to have around

[PATCH 5/7] x86: mm: new tunable for single vs full TLB flush

2014-03-10 Thread Dave Hansen
From: Dave Hansen dave.han...@linux.intel.com Most of the logic here is in the documentation file. Please take a look at it. I know we've come full-circle here back to a tunable, but this new one is *WAY* simpler. I challenge anyone to describe in one sentence how the old one worked. Here's

[PATCH 4/7] x86: mm: trace tlb flushes

2014-03-10 Thread Dave Hansen
From: Dave Hansen dave.han...@linux.intel.com We don't have any good way to figure out what kinds of flushes are being attempted. Right now, we can try to use the vm counters, but those only tell us what we actually did with the hardware (one-by-one vs full) and don't tell us what was actually

[PATCH 2/7] x86: mm: rip out complicated, out-of-date, buggy TLB flushing

2014-03-10 Thread Dave Hansen
From: Dave Hansen dave.han...@linux.intel.com I think the flush_tlb_mm_range() code that tries to tune the flush sizes based on the CPU needs to get ripped out for several reasons: 1. It is obviously buggy. It uses mm-total_vm to judge the task's footprint in the TLB. It should certainly

Re: [PATCH 0/9] re-shrink 'struct page' when SLUB is on.

2014-01-10 Thread Dave Hansen
On 01/05/2014 08:32 PM, Joonsoo Kim wrote: On Fri, Jan 03, 2014 at 02:18:16PM -0800, Andrew Morton wrote: On Fri, 03 Jan 2014 10:01:47 -0800 Dave Hansen d...@sr71.net wrote: SLUB depends on a 16-byte cmpxchg for an optimization which allows it to not disable interrupts in its fast path

Re: [PATCH 0/9] re-shrink 'struct page' when SLUB is on.

2014-01-10 Thread Dave Hansen
On 01/10/2014 03:39 PM, Andrew Morton wrote: I tested 4 cases, all of these on the cache-cold kfree() case. The first 3 are with vanilla upstream kernel source. The 4th is patched with my new slub code (all single-threaded): http://www.sr71.net/~dave/intel/slub/slub-perf-20140109.png

Re: [PATCH 0/9] re-shrink 'struct page' when SLUB is on.

2014-01-13 Thread Dave Hansen
On 01/13/2014 05:46 AM, Fengguang Wu wrote: So, I think that it is better to get more benchmark results to this patchset for convincing ourselves. If possible, how about asking Fengguang to run whole set of his benchmarks before going forward? Cc'ing him. My pleasure. Is there a git

Re: [PATCH 0/9] re-shrink 'struct page' when SLUB is on.

2014-01-13 Thread Dave Hansen
On 01/12/2014 05:44 PM, Joonsoo Kim wrote: We only touch one struct page on small allocation. In 64-byte case, we always use one cacheline for touching struct page, since it is aligned to cacheline size. However, in 56-byte case, we possibly use two cachelines because struct page isn't aligned

[PATCH 00/12] [v2] Reorganize x86 Kconfig menu

2014-01-13 Thread Dave Hansen
Changes from v1: * put MTRR under Processor Options * fix circular dependency introduced in the paravirt shuffle * dump extended platforms down to the end of the menu -- The x86 Processor type and features menu has really been letting itself go over the years. It needs to be put on a diet

[PATCH 11/12] x86 Kconfig: create x86/Kconfig.virt

2014-01-13 Thread Dave Hansen
From: Dave Hansen dave.han...@linux.intel.com Right now, there is a Enable paravirtualization code option in the Processor Features menu, which means Xen. There is also a group of paravirtualization options specific to KVM under the top-level Virtualization menu. This creates a new hypervisor

[PATCH 09/12] x86 Kconfig: create mtrr menu under processsor options

2014-01-13 Thread Dave Hansen
From: Dave Hansen dave.han...@linux.intel.com This groups the MTRR and PAT options under their own menu and puts them under the Processor Options... submenu. Note that I slightly changed the MTRR prompt text since PAT is hidden under here. This makes PAT easier to find since it depends on MTRR

[PATCH 05/12] x86 Kconfig: processor drivers

2014-01-13 Thread Dave Hansen
From: Dave Hansen dave.han...@linux.intel.com These are both drivers to access very cpu and arch-specific features. However, they are probably not very commonly changed in the configuration. Give them their own menu. Signed-off-by: Dave Hansen dave.han...@linux.intel.com --- linux.git

[PATCH 04/12] x86 Kconfig: processor options menu

2014-01-13 Thread Dave Hansen
From: Dave Hansen dave.han...@linux.intel.com This tries to consolidate the actual processor options that someone might want to configure. It's a bit arbitrary how you might separate these, but I at least took a stab at it. The real goal here was to hide stuff that folks will rarely look

[PATCH 12/12] x86 Kconfig: move paravirt under Virtualization

2014-01-13 Thread Dave Hansen
These options fit in much better in the Virtualization than in the processor features menu. Move them. Signed-off-by: Dave Hansen dave.han...@linux.intel.com Cc: Borislav Petkov b...@suse.de Cc: Dmitry Torokhov d...@vmware.com Cc: K. Y. Srinivasan k...@microsoft.com Cc: Haiyang Zhang haiya

[PATCH 08/12] x86 Kconfig: bury obscure options

2014-01-13 Thread Dave Hansen
From: Dave Hansen dave.han...@linux.intel.com While I respect the fact that the owners of Dell Inspiron 8000s have kept I8K compiling and working all these years, the URL referenced contains a helpful README: This package is no longer maintained by me. I don't use it anymore and I'm

[PATCH 03/12] x86 Kconfig: move highmem

2014-01-13 Thread Dave Hansen
From: Dave Hansen dave.han...@linux.intel.com This just continues to move things under the Memory and NUMA Options menu, breaking it up a bit to make the patches easier to audit. Signed-off-by: Dave Hansen dave.han...@linux.intel.com --- linux.git-davehans/arch/x86/Kconfig | 248

[PATCH 02/12] x86 Kconfig: memory options

2014-01-13 Thread Dave Hansen
From: Dave Hansen dave.han...@linux.intel.com This consolidates a bunch of VM, memory and NUMA options down to be under a single Kconfig menu. Most of this stuff is pretty obscure, like HIGHPTE or ZONE_DMA support. It doesn't really deserve to be in the top-level menu. For what it's worth

[PATCH 01/12] x86 Kconfig: create extended platforms menu

2014-01-13 Thread Dave Hansen
From: Dave Hansen dave.han...@linux.intel.com This takes the relatively obscure (NUMA-Q anyone?) platforms (both 32 and 64-bit) and sticks them in their own menu. Virtually nobody needs to set these, and those that do know how to find them the hard way. The new menu is also moved to the end

[PATCH 06/12] x86 Kconfig: scheduler options

2014-01-13 Thread Dave Hansen
From: Dave Hansen dave.han...@linux.intel.com These are a few architecture-specific options that affect the scheduler. Group them together. Signed-off-by: Dave Hansen dave.han...@linux.intel.com --- linux.git-davehans/arch/x86/Kconfig | 28 1 file changed, 16

[PATCH 10/12] x86 Kconfig: MCE menu

2014-01-13 Thread Dave Hansen
From: Dave Hansen dave.han...@linux.intel.com This is a fairly small one, but it still saves 3 lines in the top-level menu. Signed-off-by: Dave Hansen dave.han...@linux.intel.com --- linux.git-davehans/arch/x86/Kconfig | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff

[PATCH 07/12] x86 Kconfig: move memtest

2014-01-13 Thread Dave Hansen
From: Dave Hansen dave.han...@linux.intel.com Move the Memtest option over to the Memory Debugging menu. The code is not x86 specific at all, but it still resides in arch/x86, so keep the x86 dependency. Signed-off-by: Dave Hansen dave.han...@linux.intel.com --- linux.git-davehans/arch/x86

Re: [PATCH 1/3] kconfig: consolidate arch-specific seccomp options

2014-01-13 Thread Dave Hansen
On 01/13/2014 11:40 AM, Randy Dunlap wrote: +config SECCOMP + bool + default y Prefer def_bool y I've actually got that already in my updated set that I'll send out when the merge window opens. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body

Re: [PATCH 11/12] x86 Kconfig: create x86/Kconfig.virt

2014-01-13 Thread Dave Hansen
On 01/13/2014 02:46 PM, Paolo Bonzini wrote: Il 13/01/2014 20:22, Dave Hansen ha scritto: diff -puN arch/x86/Kconfig~x86-Kconfig-move-paravirt-under-virtualization arch/x86/Kconfig --- linux.git/arch/x86/Kconfig~x86-Kconfig-move-paravirt-under-virtualization 2014-01-13 11:11

Re: [PATCH 11/12] x86 Kconfig: create x86/Kconfig.virt

2014-01-13 Thread Dave Hansen
On 01/13/2014 03:12 PM, Paolo Bonzini wrote: Il 14/01/2014 00:00, Dave Hansen ha scritto: --- Virtualization * Kernel-based Virtual Machine (KVM) support

[PATCH 1/2] x86 Kconfig: create x86/Kconfig.virt

2014-01-13 Thread Dave Hansen
From: Dave Hansen dave.han...@linux.intel.com Right now, there is a Enable paravirtualization code option in the Processor Features menu, which means Xen. There is also a group of host-side paravirtualization options specific to KVM under the top-level Virtualization menu. I think it makes

<    4   5   6   7   8   9   10   11   12   13   >