[v5][PATCH 4/6] mm: vmscan: break out mapping freepage code

2013-06-03 Thread Dave Hansen
From: Dave Hansen dave.han...@linux.intel.com __remove_mapping() only deals with pages with mappings, meaning page cache and swap cache. At this point, the page has been removed from the mapping's radix tree, and we need to ensure that any fs-specific (or swap- specific) resources are freed up

[v5][PATCH 5/6] mm: vmscan: batch shrink_page_list() locking operations

2013-06-03 Thread Dave Hansen
From: Dave Hansen dave.han...@linux.intel.com changes for v2: * remove batch_has_same_mapping() helper. A local varible makes the check cheaper and cleaner * Move batch draining later to where we already know page_mapping(). This probably fixes a truncation race anyway * rename

[v5][PATCH 1/6] mm: swap: defer clearing of page_private() for swap cache pages

2013-06-03 Thread Dave Hansen
From: Dave Hansen dave.han...@linux.intel.com This patch defers the destruction of swapcache-specific data in 'struct page'. This simplifies the code because we do not have to keep extra copies of the data during the removal of a page from the swap cache. There are only two callers

[v5][PATCH 0/6] mm: vmscan: Batch page reclamation under shink_page_list

2013-06-03 Thread Dave Hansen
These are an update of Tim Chen's earlier work: http://lkml.kernel.org/r/1347293960.9977.70.camel@schen9-DESK I broke the patches up a bit more, and tried to incorporate some changes based on some feedback from Mel and Andrew. Changes for v5: * fix description in about the costs of

[v5][PATCH 2/6] mm: swap: make 'struct page' and swp_entry_t variants of swapcache_free().

2013-06-03 Thread Dave Hansen
From: Dave Hansen dave.han...@linux.intel.com swapcache_free() takes two arguments: void swapcache_free(swp_entry_t entry, struct page *page) Most of its callers (5/7) are from error handling paths haven't even instantiated a page, so they pass page=NULL. Both of the callers that call

Re: [PATCH 6/6] add documentation on proc.txt

2013-04-22 Thread Dave Hansen
On 04/22/2013 01:45 AM, Minchan Kim wrote: +The /proc/PID/reclaim is used to reclaim pages in this process. +To reclaim file-backed pages, + echo 1 /proc/PID/reclaim + +To reclaim anonymous pages, + echo 2 /proc/PID/reclaim + +To reclaim both pages, + echo 3

Re: [PATCH 6/6] add documentation on proc.txt

2013-04-23 Thread Dave Hansen
On 04/22/2013 06:53 PM, Minchan Kim wrote: echo 'file' /proc/PID/reclaim echo 'anon' /proc/PID/reclaim echo 'both' /proc/PID/reclaim For range reclaim, echo $((120)) 8192 /proc/PID/reclaim. IOW, we don't need any type for range reclaim because only thing user takes care is address

Re: [RFC][PATCH 5/7] create __remove_mapping_batch()

2013-05-16 Thread Dave Hansen
On 05/14/2013 08:51 AM, Mel Gorman wrote: The same comments I had before about potentially long page lock hold times still apply at this point. Andrew's concerns about the worst-case scenario where no adjacent page on the LRU has the same mapping also still applies. Is there any noticable

[RFCv2][PATCH 0/5] mm: Batch page reclamation under shink_page_list

2013-05-16 Thread Dave Hansen
These are an update of Tim Chen's earlier work: http://lkml.kernel.org/r/1347293960.9977.70.camel@schen9-DESK I broke the patches up a bit more, and tried to incorporate some changes based on some feedback from Mel and Andrew. Changes for v2: * use page_mapping() accessor instead of

[RFCv2][PATCH 2/5] make 'struct page' and swp_entry_t variants of swapcache_free().

2013-05-16 Thread Dave Hansen
From: Dave Hansen dave.han...@linux.intel.com swapcache_free() takes two arguments: void swapcache_free(swp_entry_t entry, struct page *page) Most of its callers (5/7) are from error handling paths haven't even instantiated a page, so they pass page=NULL. Both of the callers that call

[RFCv2][PATCH 4/5] break out mapping freepage code

2013-05-16 Thread Dave Hansen
From: Dave Hansen dave.han...@linux.intel.com __remove_mapping() only deals with pages with mappings, meaning page cache and swap cache. At this point, the page has been removed from the mapping's radix tree, and we need to ensure that any fs-specific (or swap- specific) resources are freed up

[RFCv2][PATCH 3/5] break up __remove_mapping()

2013-05-16 Thread Dave Hansen
From: Dave Hansen dave.han...@linux.intel.com Our goal here is to eventually reduce the number of repetitive acquire/release operations on mapping-tree_lock. Logically, this patch has two steps: 1. rename __remove_mapping() to lock_remove_mapping() since __ usually means this us the unlocked

[RFCv2][PATCH 5/5] batch shrink_page_list() locking operations

2013-05-16 Thread Dave Hansen
From: Dave Hansen dave.han...@linux.intel.com changes for v2: * remove batch_has_same_mapping() helper. A local varible makes the check cheaper and cleaner * Move batch draining later to where we already know page_mapping(). This probably fixes a truncation race anyway * rename

[RFCv2][PATCH 1/5] defer clearing of page_private() for swap cache pages

2013-05-16 Thread Dave Hansen
From: Dave Hansen dave.han...@linux.intel.com This patch defers the destruction of swapcache-specific data in 'struct page'. This simplifies the code because we do not have to keep extra copies of the data during the removal of a page from the swap cache. There are only two callers

Re: [PATCHv10 1/4] debugfs: add get/set for atomic types

2013-05-17 Thread Dave Hansen
On 05/08/2013 03:37 PM, Seth Jennings wrote: +struct dentry *debugfs_create_atomic_t(const char *name, umode_t mode, + struct dentry *parent, atomic_t *value) +{ lib/fault_inject.c has something that looks pretty similar: static struct dentry

Re: [PATCHv4 01/39] mm: drop actor argument of do_generic_file_read()

2013-05-21 Thread Dave Hansen
the rest of the set. Acked-by: Dave Hansen dave.han...@linux.intel.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http

Re: [PATCHv4 02/39] block: implement add_bdi_stat()

2013-05-21 Thread Dave Hansen
On 05/11/2013 06:22 PM, Kirill A. Shutemov wrote: From: Kirill A. Shutemov kirill.shute...@linux.intel.com We're going to add/remove a number of page cache entries at once. This patch implements add_bdi_stat() which adjusts bdi stats by arbitrary amount. It's required for batched page cache

Re: [PATCHv4 00/39] Transparent huge page cache

2013-05-21 Thread Dave Hansen
On 05/11/2013 06:22 PM, Kirill A. Shutemov wrote: From: Kirill A. Shutemov kirill.shute...@linux.intel.com It's version 4. You can also use git tree: git://git.kernel.org/pub/scm/linux/kernel/git/kas/linux.git branch thp/pagecache. If you want to check changes since v3 you can look at

Re: [PATCHv4 04/39] radix-tree: implement preload for multiple contiguous elements

2013-05-21 Thread Dave Hansen
: Acked-by: Dave Hansen dave.han...@linux.intel.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/

Re: [PATCHv4 05/39] memcg, thp: charge huge cache pages

2013-05-21 Thread Dave Hansen
(introduced in 52d4b9a memcg: allocate all page_cgroup at boot). FWIW, that commit introduced two PageCompound() checks. The other one went away inexplicably in 01b1ae63c22. Acked-by: Dave Hansen dave.han...@linux.intel.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel

Re: [PATCHv4 06/39] thp, mm: avoid PageUnevictable on active/inactive lru lists

2013-05-21 Thread Dave Hansen
merged as it stands. Have you been actually able to trigger that bug in any way in practice? Acked-by: Dave Hansen dave.han...@linux.intel.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http

Re: [PATCHv4 09/39] thp, mm: introduce mapping_can_have_hugepages() predicate

2013-05-21 Thread Dave Hansen
On 05/11/2013 06:23 PM, Kirill A. Shutemov wrote: From: Kirill A. Shutemov kirill.shute...@linux.intel.com Returns true if mapping can have huge pages. Just check for __GFP_COMP in gfp mask of the mapping for now. Signed-off-by: Kirill A. Shutemov kirill.shute...@linux.intel.com ---

Re: [PATCHv4 10/39] thp: account anon transparent huge pages into NR_ANON_PAGES

2013-05-21 Thread Dave Hansen
this, any why you think it's OK. But, it still makes solid sense to me, and simplifies the code. Acked-by: Dave Hansen dave.han...@linux.intel.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http

Re: [PATCHv4 11/39] thp: represent file thp pages in meminfo and friends

2013-05-21 Thread Dave Hansen
that this depends on the previous NR_ANON_PAGES behaviour change to make sense. Otherwise, Acked-by: Dave Hansen dave.han...@linux.intel.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http

Re: [PATCHv4 13/39] mm: trace filemap: dump page order

2013-05-21 Thread Dave Hansen
On 05/11/2013 06:23 PM, Kirill A. Shutemov wrote: From: Kirill A. Shutemov kirill.shute...@linux.intel.com Dump page order to trace to be able to distinguish between small page and huge page in page cache. Acked-by: Dave Hansen dave.han...@linux.intel.com -- To unsubscribe from this list

Re: [PATCHv4 12/39] thp, mm: rewrite add_to_page_cache_locked() to support huge pages

2013-05-21 Thread Dave Hansen
in the _middle_ of a THP? Despite my nits, the code still looks correct here, so: Acked-by: Dave Hansen dave.han...@linux.intel.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http

Re: [PATCHv4 14/39] thp, mm: rewrite delete_from_page_cache() to support huge pages

2013-05-21 Thread Dave Hansen
On 05/11/2013 06:23 PM, Kirill A. Shutemov wrote: From: Kirill A. Shutemov kirill.shute...@linux.intel.com As with add_to_page_cache_locked() we handle HPAGE_CACHE_NR pages a time. Signed-off-by: Kirill A. Shutemov kirill.shute...@linux.intel.com --- mm/filemap.c | 31

Re: [PATCHv4 15/39] thp, mm: trigger bug in replace_page_cache_page() on THP

2013-05-21 Thread Dave Hansen
On 05/11/2013 06:23 PM, Kirill A. Shutemov wrote: From: Kirill A. Shutemov kirill.shute...@linux.intel.com replace_page_cache_page() is only used by FUSE. It's unlikely that we will support THP in FUSE page cache any soon. Let's pospone implemetation of THP handling in

Re: [PATCHv4 16/39] thp, mm: locking tail page is a bug

2013-05-21 Thread Dave Hansen
that can theoretically get merged separately. Acked-by: Dave Hansen dave.han...@linux.intel.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read

Re: [PATCHv4 17/39] thp, mm: handle tail pages in page_cache_get_speculative()

2013-05-21 Thread Dave Hansen
On 05/11/2013 06:23 PM, Kirill A. Shutemov wrote: From: Kirill A. Shutemov kirill.shute...@linux.intel.com For tail page we call __get_page_tail(). It has the same semantics, but for tail page. page_cache_get_speculative() has a ~50-line comment above it with lots of scariness about grace

Re: [PATCHv4 18/39] thp, mm: add event counters for huge page alloc on write to a file

2013-05-21 Thread Dave Hansen
On 05/11/2013 06:23 PM, Kirill A. Shutemov wrote: diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h index d4b7a18..584c71c 100644 --- a/include/linux/vm_event_item.h +++ b/include/linux/vm_event_item.h @@ -71,6 +71,8 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN,

Re: [PATCHv4 19/39] thp, mm: allocate huge pages in grab_cache_page_write_begin()

2013-05-21 Thread Dave Hansen
On 05/11/2013 06:23 PM, Kirill A. Shutemov wrote: From: Kirill A. Shutemov kirill.shute...@linux.intel.com Try to allocate huge page if flags has AOP_FLAG_TRANSHUGE. Why do we need this flag? When might we set it, and when would we not set it? What kinds of callers need to check for and act

Re: [PATCHv4 20/39] thp, mm: naive support of thp in generic read/write routines

2013-05-21 Thread Dave Hansen
On 05/11/2013 06:23 PM, Kirill A. Shutemov wrote: + if (PageTransHuge(page)) + offset = pos ~HPAGE_PMD_MASK; + pagefault_disable(); - copied = iov_iter_copy_from_user_atomic(page, i, offset, bytes); + copied =

Re: [PATCHv4 21/39] thp, libfs: initial support of thp in simple_read/write_begin/write_end

2013-05-21 Thread Dave Hansen
On 05/11/2013 06:23 PM, Kirill A. Shutemov wrote: From: Kirill A. Shutemov kirill.shute...@linux.intel.com For now we try to grab a huge cache page if gfp_mask has __GFP_COMP. It's probably to weak condition and need to be reworked later. Signed-off-by: Kirill A. Shutemov

Re: [PATCHv4 23/39] thp: wait_split_huge_page(): serialize over i_mmap_mutex too

2013-05-21 Thread Dave Hansen
On 05/11/2013 06:23 PM, Kirill A. Shutemov wrote: From: Kirill A. Shutemov kirill.shute...@linux.intel.com Since we're going to have huge pages backed by files, wait_split_huge_page() has to serialize not only over anon_vma_lock, but over i_mmap_mutex too. ... -#define

Re: [PATCHv4 24/39] thp, mm: truncate support for transparent huge page cache

2013-05-21 Thread Dave Hansen
On 05/11/2013 06:23 PM, Kirill A. Shutemov wrote: If we starting position of truncation is in tail page we have to spilit the huge page page first. That's a very interesting sentence sentence. :) We also have to split if end is within the huge page. Otherwise we can truncate whole huge page

Re: [PATCHv4 26/39] ramfs: enable transparent huge page cache

2013-05-21 Thread Dave Hansen
On 05/11/2013 06:23 PM, Kirill A. Shutemov wrote: From: Kirill A. Shutemov kirill.shute...@linux.intel.com ramfs is the most simple fs from page cache point of view. Let's start transparent huge page cache enabling here. For now we allocate only non-movable huge page. ramfs pages cannot be

Re: [PATCHv4 27/39] x86-64, mm: proper alignment mappings with hugepages

2013-05-21 Thread Dave Hansen
On 05/11/2013 06:23 PM, Kirill A. Shutemov wrote: +static inline unsigned long mapping_align_mask(struct address_space *mapping) +{ + if (mapping_can_have_hugepages(mapping)) + return PAGE_MASK ~HPAGE_MASK; + return get_align_mask(); +} get_align_mask() appears to be a

Re: [PATCHv4 27/39] x86-64, mm: proper alignment mappings with hugepages

2013-05-21 Thread Dave Hansen
On 05/11/2013 06:23 PM, Kirill A. Shutemov wrote: From: Kirill A. Shutemov kirill.shute...@linux.intel.com Make arch_get_unmapped_area() return unmapped area aligned to HPAGE_MASK if the file mapping can have huge pages. OK, so there are at least four phases of this patch set which are

Re: [PATCHv4 29/39] thp: move maybe_pmd_mkwrite() out of mk_huge_pmd()

2013-05-21 Thread Dave Hansen
, but I do appreciate the consistency it adds, so: Acked-by: Dave Hansen dave.han...@linux.intel.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCHv4 29/39] thp: move maybe_pmd_mkwrite() out of mk_huge_pmd()

2013-05-21 Thread Dave Hansen
On 05/11/2013 06:23 PM, Kirill A. Shutemov wrote: From: Kirill A. Shutemov kirill.shute...@linux.intel.com It's confusing that mk_huge_pmd() has sematics different from mk_pte() or mk_pmd(). Let's move maybe_pmd_mkwrite() out of mk_huge_pmd() and adjust prototype to match mk_pte(). Oh,

Re: [PATCHv4 31/39] thp: consolidate code between handle_mm_fault() and do_huge_pmd_anonymous_page()

2013-05-21 Thread Dave Hansen
On 05/11/2013 06:23 PM, Kirill A. Shutemov wrote: From: Kirill A. Shutemov kirill.shute...@linux.intel.com do_huge_pmd_anonymous_page() has copy-pasted piece of handle_mm_fault() to handle fallback path. Let's consolidate code back by introducing VM_FAULT_FALLBACK return code.

Re: [PATCHv4 32/39] mm: cleanup __do_fault() implementation

2013-05-21 Thread Dave Hansen
On 05/11/2013 06:23 PM, Kirill A. Shutemov wrote: From: Kirill A. Shutemov kirill.shute...@linux.intel.com Let's cleanup __do_fault() to prepare it for transparent huge pages support injection. Cleanups: - int - bool where appropriate; - unindent some code by reverting 'if' condition;

Re: [PATCHv4 04/39] radix-tree: implement preload for multiple contiguous elements

2013-05-22 Thread Dave Hansen
On 05/22/2013 05:03 AM, Kirill A. Shutemov wrote: On most machines we will have RADIX_TREE_MAP_SHIFT=6. In this case, on 64-bit system the per-CPU feature overhead is for preload array: (30 - 21) * sizeof(void*) = 72 bytes plus, if the preload array is full (30 - 21) * sizeof(struct

Re: [PATCHv4 16/39] thp, mm: locking tail page is a bug

2013-05-22 Thread Dave Hansen
On 05/22/2013 07:12 AM, Kirill A. Shutemov wrote: Dave Hansen wrote: On 05/11/2013 06:23 PM, Kirill A. Shutemov wrote: From: Kirill A. Shutemov kirill.shute...@linux.intel.com Locking head page means locking entire compound page. If we try to lock tail page, something went wrong. Have you

Re: [PATCHv4 26/39] ramfs: enable transparent huge page cache

2013-05-22 Thread Dave Hansen
On 05/22/2013 07:22 AM, Kirill A. Shutemov wrote: Dave Hansen wrote: + /* +* TODO: make ramfs pages movable +*/ + mapping_set_gfp_mask(inode-i_mapping, + GFP_TRANSHUGE ~__GFP_MOVABLE); So, before these patches, ramfs

Re: [PATCHv4 29/39] thp: move maybe_pmd_mkwrite() out of mk_huge_pmd()

2013-05-22 Thread Dave Hansen
On 05/22/2013 07:37 AM, Kirill A. Shutemov wrote: Dave Hansen wrote: On 05/11/2013 06:23 PM, Kirill A. Shutemov wrote: From: Kirill A. Shutemov kirill.shute...@linux.intel.com It's confusing that mk_huge_pmd() has sematics different from mk_pte() or mk_pmd(). Let's move maybe_pmd_mkwrite

Re: [PATCHv4 09/39] thp, mm: introduce mapping_can_have_hugepages() predicate

2013-05-22 Thread Dave Hansen
On 05/22/2013 06:51 AM, Kirill A. Shutemov wrote: Dave Hansen wrote: Also, what happens if transparent_hugepage_flags (1TRANSPARENT_HUGEPAGE_PAGECACHE) becomes false at runtime and you have some already-instantiated huge page cache mappings around? Will things like mapping_align_mask

Re: [PATCHv4 07/39] thp, mm: basic defines for transparent huge page cache

2013-05-23 Thread Dave Hansen
On 05/23/2013 03:36 AM, Hillf Danton wrote: On Sun, May 12, 2013 at 9:23 AM, Kirill A. Shutemov kirill.shute...@linux.intel.com wrote: From: Kirill A. Shutemov kirill.shute...@linux.intel.com Better if one or two sentences are prepared to show that the following defines are necessary. ...

Re: [PATCHv4 12/39] thp, mm: rewrite add_to_page_cache_locked() to support huge pages

2013-05-23 Thread Dave Hansen
On 05/23/2013 07:36 AM, Kirill A. Shutemov wrote: + if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE_PAGECACHE)) { + BUILD_BUG_ON(HPAGE_CACHE_NR RADIX_TREE_PRELOAD_NR); + nr = hpage_nr_pages(page); + } else { + BUG_ON(PageTransHuge(page)); + nr = 1; +

Re: [PATCHv4 15/39] thp, mm: trigger bug in replace_page_cache_page() on THP

2013-05-28 Thread Dave Hansen
On 05/28/2013 05:53 AM, Kirill A. Shutemov wrote: Dave Hansen wrote: On 05/11/2013 06:23 PM, Kirill A. Shutemov wrote: + VM_BUG_ON(PageTransHuge(old)); + VM_BUG_ON(PageTransHuge(new)); VM_BUG_ON(!PageLocked(old)); VM_BUG_ON(!PageLocked(new)); VM_BUG_ON(new-mapping

mmotm-2013-05-22: Bad page state

2013-05-28 Thread Dave Hansen
I was rebasing my mapping-radix_tree lock batching patches on top of Mel's stuff. It looks like something is jumping the gun and freeing a page before it has been written out. Somebody probably did an extra put_page() or something. I'm running 3.10.0-rc2-mm1-00322-g8d4c612 from

Re: mmotm-2013-05-22: Bad page state

2013-05-28 Thread Dave Hansen
On 05/28/2013 02:51 PM, Dave Hansen wrote: I was rebasing my mapping-radix_tree lock batching patches on top of Mel's stuff. It looks like something is jumping the gun and freeing a page before it has been written out. Somebody probably did an extra put_page() or something. I'm running

cpu hotplug broken in linux-next

2013-05-28 Thread Dave Hansen
If I boot with: maxcpus=2 possible_cpus=4 I get # grep . /sys/devices/system/cpu/cpu[0-9]*/online' /sys/devices/system/cpu/cpu1/online:1 /sys/devices/system/cpu/cpu2/online:1 /sys/devices/system/cpu/cpu3/online:1 When 2 and 3 *should* be offline. I also get

Re: [PATCH] x86, mm: get ASLR work for hugetlb mappings

2013-10-22 Thread Dave Hansen
On 10/22/2013 06:52 AM, Kirill A. Shutemov wrote: Matthew noticed that hugetlb doesn't participate in ASLR on x86-64. The reason is genereic hugetlb_get_unmapped_area() which is used on x86-64. It doesn't support randomization and use bottom-up unmapped area lookup, instead of usual top-down

Re: Unnecessary mass OOM kills on Linux 3.11 virtualization host

2013-10-28 Thread Dave Hansen
On 10/28/2013 01:28 AM, Richard Davies wrote: I further attach some other types of memory manager errors found in the kernel logs around the same time. There are several occurrences of each, but I have only copied one here for brevity: 19:18:27 kernel: BUG: Bad page map in process

[PATCH 2/2] mm: thp: give transparent hugepage code a separate copy_page

2013-10-28 Thread Dave Hansen
From: Dave Hansen dave.han...@linux.intel.com Right now, the migration code in migrate_page_copy() uses copy_huge_page() for hugetlbfs and thp pages: if (PageHuge(page) || PageTransHuge(page)) copy_huge_page(newpage, page); So, yay for code reuse. But: void

[PATCH 1/2] mm: hugetlbfs: Add some VM_BUG_ON()s to catch non-hugetlbfs pages

2013-10-28 Thread Dave Hansen
Dave Jiang reported that he was seeing oopses when running NUMA systems and default_hugepagesz=1G. I traced the issue down to migrate_page_copy() trying to use the same code for hugetlb pages and transparent hugepages. It should not have been trying to pass thp pages in there. So, add some

Re: [PATCH 1/2] vmsplice: unmap gifted pages for recipient

2013-10-08 Thread Dave Hansen
On 10/07/2013 01:21 PM, Robert C Jennings wrote: + } else { + if (vma) + zap_page_range(vma, +

Re: [PATCH 1/2] vmsplice: unmap gifted pages for recipient

2013-10-08 Thread Dave Hansen
On 10/07/2013 01:21 PM, Robert C Jennings wrote: spd.partial[page_nr].offset = loff; spd.partial[page_nr].len = this_len; + spd.partial[page_nr].useraddr = index PAGE_CACHE_SHIFT; len -= this_len; loff = 0;

Re: [PATCH 2/2] vmsplice: Add limited zero copy to vmsplice

2013-10-08 Thread Dave Hansen
On 10/07/2013 01:21 PM, Robert C Jennings wrote: + if (!buf-offset (buf-len == PAGE_SIZE) + (buf-flags PIPE_BUF_FLAG_GIFT) (sd-flags SPLICE_F_MOVE)) { + struct page *page = buf-page; + struct mm_struct *mm; + struct vm_area_struct *vma; +

Re: [PATCH 1/2] vmsplice: unmap gifted pages for recipient

2013-10-08 Thread Dave Hansen
On 10/08/2013 12:48 PM, Robert Jennings wrote: * Dave Hansen (d...@sr71.net) wrote: On 10/07/2013 01:21 PM, Robert C Jennings wrote: + } else { + if (vma

Re: NMIs induced by 'perf top' hogging all CPU time

2013-05-10 Thread Dave Hansen
On 05/10/2013 03:29 AM, Peter Zijlstra wrote: On Thu, May 09, 2013 at 04:29:16PM -0700, Dave Hansen wrote: If I boot a recent kernel (bb9055b) and run 'perf top' on my machine, it hangs. It is 100% reproducible; it happens every single time. If I'm lucky, I'll get some of the hardlockup

Re: [RFC][PATCH 6/7] use __remove_mapping_batch() in shrink_page_list()

2013-05-14 Thread Dave Hansen
On 05/14/2013 09:05 AM, Mel Gorman wrote: This helper seems overkill. Why not just have batch_mapping in shrink_page_list() that is set when the first page is added to the batch_for_mapping_removal and defer the decision to drain until after the page mapping has been looked up? struct

Re: [PATCHv11 3/4] zswap: add to mm/

2013-05-15 Thread Dave Hansen
On 05/15/2013 01:09 PM, Seth Jennings wrote: On Wed, May 15, 2013 at 02:55:06PM -0400, Konrad Rzeszutek Wilk wrote: Sorry, but I don't think that's appropriate for a patch in the MM subsystem. Perhaps a compromise can be reached where this code is merged as a driver not a core mm component.

Re: [PATCH 2/2 v2] mm: allow to set overcommit ratio more precisely

2013-09-05 Thread Dave Hansen
On 09/05/2013 05:51 AM, Jerome Marchand wrote: This patch adds the new overcommit_ratio_ppm sysctl variable that allow to set overcommit ratio with a part per million precision. The old overcommit_ratio variable can still be used to set and read the ratio with a 1% precision. That way,

Re: [PATCH] Kconfig.debug: Add FRAME_POINTER anti-dependency for ARC

2013-08-29 Thread Dave Hansen
On 08/27/2013 01:31 AM, Vineet Gupta wrote: Frame pointer on ARC doesn't serve the conventional purpose of stack unwinding due to the typical way ABI designates it's usage. Thus it's explicit usage on ARC is discouraged (gcc is free to use it, for some tricky stack frames even if

Re: [PATCH] Avoid useless inodes and dentries reclamation

2013-08-29 Thread Dave Hansen
The new shrinker infrastructure in mmotm looks like it will make this problem worse. old code: shrink_slab() for_each_shrinker { do_shrinker_shrink(); // one per batch prune_super() grab_super_passive() } }

Re: [PATCH] Kconfig.debug: Add FRAME_POINTER anti-dependency for ARC

2013-08-30 Thread Dave Hansen
On 08/30/2013 12:48 AM, Vineet Gupta wrote: If we had ARCH_FRAME_POINTER_UNAVAILABLE (def_bool n), we could potentially remove ARCH_FRAME_POINTER too: 1. arches which explicitly select ARCH_FRAME_POINTER (xtensa, parisc, arm64, x86, unicore32, tile) could just drop that select. 2.

Re: [RESEND RFC PATCH v3 00/35] mm: Memory Power Management

2013-08-30 Thread Dave Hansen
On 08/30/2013 06:13 AM, Srivatsa S. Bhat wrote: Overview of Memory Power Management and its implications to the Linux MM Today, we are increasingly seeing computer systems sporting larger and larger amounts of RAM, in

Re: [PATCH] perf: fix interrupt handler timing harness

2013-07-08 Thread Dave Hansen
On 07/04/2013 03:30 PM, Stephane Eranian wrote: There was an misunderstanding on the API of the do_div() macro. It returns the remainder of the division and this was not what the function expected leading to disabling the interrupt latency watchdog. Misunderstanding is a very kind term for

Re: [PATCH] perf: fix interrupt handler timing harness

2013-07-08 Thread Dave Hansen
On 07/08/2013 11:08 AM, Stephane Eranian wrote: I admit I have some issues with your patch and what it is trying to avoid. There is already interrupt throttling. Your code seems to address latency issues on the handler rather than rate issues. Yet to mitigate the latency it is modify the

Re: [PATCH] perf: fix interrupt handler timing harness

2013-07-08 Thread Dave Hansen
On 07/08/2013 01:20 PM, Stephane Eranian wrote: On Mon, Jul 8, 2013 at 10:05 PM, Dave Hansen dave.han...@intel.com wrote: If the interrupts _consistently_ take too long individually they can starve out all the other CPU users. I saw no way to make them finish faster, so the only recourse

[PATCH] x86: perf: fix incorrect use of do_div() in nmi warning

2013-07-08 Thread Dave Hansen
From: Dave Hansen dave.han...@linux.intel.com This fixes a bug present in 3.10 and introduced here: commit 2ab00456ea8a0d79acb1390659b98416111880b2 Author: Dave Hansen dave.han...@linux.intel.com Date: Fri Jun 21 08:51:35 2013 -0700 x86: Warn when NMI handlers take large amounts of time

Re: [PATCHv6 00/22] Transparent huge page cache: phase 1, everything but mmap()

2013-09-26 Thread Dave Hansen
On 09/23/2013 05:05 AM, Kirill A. Shutemov wrote: To proof that the proposed changes are functional we enable the feature for the most simple file system -- ramfs. ramfs is not that useful by itself, but it's good pilot project. This does, at the least, give us a shared memory mechanism that

Re: [RFC PATCH v4 06/40] mm: Demarcate and maintain pageblocks in region-order in the zones' freelists

2013-09-26 Thread Dave Hansen
On 09/25/2013 04:14 PM, Srivatsa S. Bhat wrote: @@ -605,16 +713,22 @@ static inline void __free_one_page(struct page *page, buddy_idx = __find_buddy_index(combined_idx, order + 1); higher_buddy = higher_page + (buddy_idx - combined_idx); if

Re: [PATCHv4 02/10] mm: convert mm-nr_ptes to atomic_t

2013-09-27 Thread Dave Hansen
On 09/27/2013 01:46 PM, Cody P Schafer wrote: On 09/27/2013 06:16 AM, Kirill A. Shutemov wrote: @@ -339,6 +339,7 @@ struct mm_struct { pgd_t * pgd; atomic_t mm_users;/* How many users with user space? */ atomic_t mm_count;/* How many references to

Re: [PATCHv6 00/22] Transparent huge page cache: phase 1, everything but mmap()

2013-09-30 Thread Dave Hansen
On 09/30/2013 03:02 AM, Mel Gorman wrote: I am afraid I never looked too closely once I learned that the primary motivation for this was relieving iTLB pressure in a very specific case. AFAIK, this is not a problem in the vast majority of modern CPUs and I found it very hard to be motivated to

Re: [RESEND][PATCH] mm: vmstats: tlb flush counters

2013-07-19 Thread Dave Hansen
On 07/19/2013 04:38 AM, Raghavendra KT wrote: While measuring non - PLE performance, one of the bottleneck, I am seeing is flush tlbs. perf had helped in alaysing a bit there, but this patch would help in precise calculation. It will aslo help in tuning the PLE window experiments (larger PLE

Re: [RESEND][PATCH] mm: vmstats: tlb flush counters

2013-07-19 Thread Dave Hansen
On 07/19/2013 01:28 AM, Ingo Molnar wrote: UP is slowly going extinct, but in any case these counters ought to inform us about TLB flushes even on UP systems: +NR_TLB_LOCAL_FLUSH_ALL, +NR_TLB_LOCAL_FLUSH_ONE, +NR_TLB_LOCAL_FLUSH_ONE_KERNEL,

[PATCH] mm: vmstats: track TLB flush stats on UP too

2013-07-19 Thread Dave Hansen
), but the mtrr code is ancient and I'm hesitant to touch it other than to just stick in the counters. Signed-off-by: Dave Hansen dave.han...@linux.intel.com --- linux.git-davehans/arch/x86/include/asm/tlbflush.h| 38 +++--- linux.git-davehans/arch/x86/kernel/cpu/mtrr/generic.c |2

Re: [RESEND][PATCH] mm: vmstats: tlb flush counters

2013-07-22 Thread Dave Hansen
On 07/22/2013 03:06 AM, Ingo Molnar wrote: Btw., would be nice to also integrate these VM counters into perf as well, as an instrumentation variant/option. It could be done in an almost zero overhead fashion using jump-labels I think. [ Just in case someone is bored to death and is

Re: [PATCH v2] mm/hotplug, x86: Disable ARCH_MEMORY_PROBE by default

2013-07-22 Thread Dave Hansen
On 07/22/2013 01:57 PM, KOSAKI Motohiro wrote: One of possible option is to return EINVAL when system has real hotplug device. I mean this interface is only useful when system don't have proper hardware feature and doesn't work correctly hardware property and this interface command are not

Re: [PATCH 1/1] Drivers: base: memory: Export symbols for onlining memory blocks

2013-07-23 Thread Dave Hansen
On 07/23/2013 07:52 AM, KY Srinivasan wrote: The current scheme of involving user level code to close this loop obviously does not perform well under high memory pressure. Adding memory usually requires allocating some large, contiguous areas of memory for use as mem_map[] and other VM

Re: [PATCH 1/1] Drivers: base: memory: Export symbols for onlining memory blocks

2013-07-23 Thread Dave Hansen
On 07/23/2013 08:54 AM, KY Srinivasan wrote: Adding memory usually requires allocating some large, contiguous areas of memory for use as mem_map[] and other VM structures. That's really hard to do under heavy memory pressure. How are you accomplishing this? I cannot avoid failures because

Re: [PATCH] mm: zswap: add runtime enable/disable

2013-07-23 Thread Dave Hansen
On 07/23/2013 10:32 AM, Seth Jennings wrote: On Tue, Jul 23, 2013 at 05:16:07PM +0800, Bob Liu wrote: On 07/23/2013 03:34 AM, Seth Jennings wrote: -To enabled zswap, the enabled attribute must be set to 1 at boot time. e.g. -zswap.enabled=1 +Zswap is disabled by default but can be enabled

Re: [PATCH v2] mm/hotplug, x86: Disable ARCH_MEMORY_PROBE by default

2013-07-23 Thread Dave Hansen
On 07/23/2013 01:45 PM, Toshi Kani wrote: Dave, is this how you are testing? Do you always specify a valid memory address for your testing? For the moment, yes. I'm actually working on some other patches that add the kernel metadata for memory ranges even if they're not backed by physical

Re: [PATCH 1/1] Drivers: base: memory: Export symbols for onlining memory blocks

2013-07-24 Thread Dave Hansen
On 07/23/2013 10:21 AM, KY Srinivasan wrote: You have allocated some large, physically contiguous areas of memory under heavy pressure. But you also contend that there is too much memory pressure to run a small userspace helper. Under heavy memory pressure, I'd expect large, kernel

Re: [PATCH 1/1] Drivers: base: memory: Export symbols for onlining memory blocks

2013-07-24 Thread Dave Hansen
On 07/24/2013 12:45 PM, KY Srinivasan wrote: All I am saying is that I see two classes of failures: (a) Our inability to allocate memory to manage the memory that is being hot added and (b) Our inability to bring the hot added memory online within a reasonable amount of time. I am not sure the

Re: [PATCH] x86/efi: Only pass mapped RAM regions to free_bootmem_late()

2014-06-05 Thread Dave Hansen
On 06/05/2014 06:27 AM, Matt Fleming wrote: free_bootmem_late() expects to only be passed RAM regions that the kernel can access, and that have a corresponding 'struct page'. It's possible for regions in the EFI memory map to reside in address ranges for which pfn_to_page() doesn't work, for

Re: pte_present check on hugetlb_entry fix for 3.15?

2014-06-06 Thread Dave Hansen
On 06/06/2014 11:59 AM, Andrew Morton wrote: On Fri, 6 Jun 2014 14:46:37 -0400 Naoya Horiguchi n-horigu...@ah.jp.nec.com wrote: On Fri, Jun 06, 2014 at 01:36:54PM -0400, Josh Boyer wrote: Hi Naoya, I noticed that your mm-add-pte_present-check-on-existing-hugetlb_entry-callbacks.patch in

Re: [PATCH 2/7] mm/pagewalk: replace mm_walk-skip with more general mm_walk-control

2014-06-09 Thread Dave Hansen
On 06/06/2014 03:58 PM, Naoya Horiguchi wrote: +enum mm_walk_control { + PTWALK_NEXT = 0,/* Go to the next entry in the same level or + * the next vma. This is default behavior. */ + PTWALK_DOWN,/* Go down to lower level */ +

Re: [PATCH 6/7] mm/pagewalk: move pmd_trans_huge_lock() from callbacks to common code

2014-06-09 Thread Dave Hansen
On 06/06/2014 03:58 PM, Naoya Horiguchi wrote: @@ -6723,14 +6723,9 @@ static int mem_cgroup_count_precharge_pmd(pmd_t *pmd, struct mm_walk *walk) { struct vm_area_struct *vma = walk-vma; - spinlock_t *ptl; - if (pmd_trans_huge_lock(pmd,

Re: [PATCH 2/7] mm/pagewalk: replace mm_walk-skip with more general mm_walk-control

2014-06-09 Thread Dave Hansen
On 06/09/2014 02:29 PM, Naoya Horiguchi wrote: static int subpage_walk_pmd_entry(pmd_t *pmd, unsigned long addr, unsigned long end, struct mm_walk *walk) { struct vm_area_struct *vma = walk-vma; +spin_unlock(walk-ptl);

KVM_GUEST support breaks page fault tracing

2014-05-08 Thread Dave Hansen
I noticed on some of my systems that page fault tracing doesn't work: cd /sys/kernel/debug/tracing echo 1 events/exceptions/enable cat trace; # nothing shows up I eventually traced it down to CONFIG_KVM_GUEST. At least in a KVM VM, enabling that option breaks

Re: KVM_GUEST support breaks page fault tracing

2014-05-08 Thread Dave Hansen
On 05/08/2014 03:24 PM, Thomas Gleixner wrote: I noticed on some of my systems that page fault tracing doesn't work: cd /sys/kernel/debug/tracing echo 1 events/exceptions/enable cat trace; # nothing shows up I eventually traced it down to CONFIG_KVM_GUEST. At least

Re: KVM_GUEST support breaks page fault tracing

2014-05-09 Thread Dave Hansen
On 05/08/2014 04:45 PM, Steven Rostedt wrote: As your patch showed up as an attachment, I couldn't include it in my reply. But sure, that may work. But you could also play tricks to keep the overhead off when tracing is disabled like this one: ... How important is it to have zero-overhead?

Re: [patch] mm, hotplug: probe interface is available on several platforms

2014-06-11 Thread Dave Hansen
On 06/11/2014 03:15 PM, David Rientjes wrote: +CONFIG_ARCH_MEMORY_PROBE and can be configured on powerpc, sh, and x86 +if hotplug is supported, although for x86 this should be handled by ACPI +notification. Looks like a good change, in general. My only nit is that this implies that all

Re: [PATCH -mm v2 06/11] pagewalk: add size to struct mm_walk

2014-06-12 Thread Dave Hansen
On 06/12/2014 02:48 PM, Naoya Horiguchi wrote: This variable is helpful if we try to share the callback function between multiple slots (for example between pte_entry() and pmd_entry()) as done in later patches. smaps_pte() already does this: static int smaps_pte(pte_t *pte, unsigned long

[bisected] pre-3.16 regression on open() scalability

2014-06-13 Thread Dave Hansen
Hi Paul, I'm seeing a regression when comparing 3.15 to Linus's current tree. I'm using Anton Blanchard's will-it-scale open1 test which creates a bunch of processes and does open()/close() in a tight loop: https://github.com/antonblanchard/will-it-scale/blob/master/tests/open1.c At about 50

<    6   7   8   9   10   11   12   13   14   15   >