[PATCH -v4 7/9] mm, THP: Add can_split_huge_page()

2016-09-28 Thread Huang, Ying
From: Huang Ying Separates checking whether we can split the huge page from split_huge_page_to_list() into a function. This will help to check that before splitting the THP (Transparent Huge Page) really. This will be used for delaying splitting THP during swapping out. Where for a THP, we

[PATCH] Subject: [PATCH -v4] THP swap: Delay splitting THP during swapping out

2016-09-28 Thread Huang, Ying
From: Huang Ying Johannes suggested me to use two big patches instead 9 patches. And he feels that is easier for him to review. I am not sure whether this is desirable for other reviewers too. So I sent out both versions for review. If this version is preferable for more reviewers, I will

[PATCH -v4 3/9] mm, THP, swap: Add swap cluster allocate/free functions

2016-09-28 Thread Huang, Ying
From: Huang Ying The swap cluster allocation/free functions are added based on the existing swap cluster management mechanism for SSD. These functions don't work for the rotating hard disks because the existing swap cluster management mechanism doesn't work for them. The hard disks s

[PATCH -v4 1/9] mm, swap: Make swap cluster size same of THP size on x86_64

2016-09-28 Thread Huang, Ying
From: Huang Ying In this patch, the size of the swap cluster is changed to that of the THP (Transparent Huge Page) on x86_64 architecture (512). This is for the THP swap support on x86_64. Where one swap cluster will be used to hold the contents of each THP swapped out. And some information

[PATCH -v4 4/9] mm, THP, swap: Add get_huge_swap_page()

2016-09-28 Thread Huang, Ying
From: Huang Ying A variation of get_swap_page(), get_huge_swap_page(), is added to allocate a swap cluster (HPAGE_PMD_NR swap slots) based on the swap cluster allocation function. A fair simple algorithm is used, that is, only the first swap device in priority list will be tried to allocate the

[PATCH -v4 2/9] mm, memcg: Support to charge/uncharge multiple swap entries

2016-09-28 Thread Huang, Ying
From: Huang Ying This patch make it possible to charge or uncharge a set of continuous swap entries in the swap cgroup. The number of swap entries is specified via an added parameter. This will be used for the THP (Transparent Huge Page) swap support. Where a swap cluster backing a THP may be

[PATCH -v4 8/9] mm, THP, swap: Support to split THP in swap cache

2016-09-28 Thread Huang, Ying
From: Huang Ying This patch enhanced the split_huge_page_to_list() to work properly for the THP (Transparent Huge Page) in the swap cache during swapping out. This is used for delaying splitting the THP during swapping out. Where for a THP to be swapped out, we will allocate a swap cluster

[PATCH -v4 6/9] mm, THP, swap: Support to add/delete THP to/from swap cache

2016-09-28 Thread Huang, Ying
From: Huang Ying With this patch, a THP (Transparent Huge Page) can be added/deleted to/from the swap cache as a set of (HPAGE_PMD_NR) sub-pages. This will be used for the THP (Transparent Huge Page) swap support. Where one THP may be added/delted to/from the swap cache. This will batch the

[PATCH -v4 00/10] THP swap: Delay splitting THP during swapping out

2016-09-28 Thread Huang, Ying
From: Huang Ying This patchset is to optimize the performance of Transparent Huge Page (THP) swap. Hi, Andrew, could you help me to check whether the overall design is reasonable? Hi, Hugh, Shaohua, Minchan and Rik, could you help me to review the swap part of the patchset? Especially [01/10

Re: [PATCH 2/8] mm/swap: Add cluster lock

2016-09-28 Thread Huang, Ying
; +goto new_cluster; >> +} >> +ci = lock_cluster(si, tmp); >> +while (tmp < max) { > > In this work tmp is checked to be less than the max value. > Semantic change hoped? Oops! tmp should be checked to be more than the min value. Will fix

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-09-26 Thread Huang, Ying
this unconvered. But > in the end will probabkly stuck with a slight regression in this > artificial workload. I see. Thanks for update. Please keep me posted. Best Regards, Huang, Ying

Re: [LKP] [lkp] [f2fs] ec795418c4: fsmark.files_per_sec -36.3% regression

2016-09-26 Thread Huang, Ying
Jaegeuk Kim writes: > On Mon, Sep 26, 2016 at 02:26:06PM +0800, Huang, Ying wrote: >> Hi, Jaegeuk, >> >> "Huang, Ying" writes: >> >> > Jaegeuk Kim writes: >> > >> >> Hello, >> >> >&g

Re: [LKP] [lkp] [f2fs] ec795418c4: fsmark.files_per_sec -36.3% regression

2016-09-25 Thread Huang, Ying
Hi, Jaegeuk, "Huang, Ying" writes: > Jaegeuk Kim writes: > >> Hello, >> >> On Sat, Aug 27, 2016 at 10:13:34AM +0800, Fengguang Wu wrote: >>> Hi Jaegeuk, >>> >>> > > >> > - [lkp] [f2fs] b93f771286: aim7.jobs-per-min

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-09-25 Thread Huang, Ying
Hi, Christoph, "Huang, Ying" writes: > Hi, Christoph, > > "Huang, Ying" writes: > >> Christoph Hellwig writes: >> >>> Snipping the long contest: >>> >>> I think there are three observations here: >>> >>&g

Re: [PATCH -v3 00/10] THP swap: Delay splitting THP during swapping out

2016-09-25 Thread Huang, Ying
Shaohua Li writes: > On Fri, Sep 23, 2016 at 10:32:39AM +0800, Huang, Ying wrote: >> Rik van Riel writes: >> >> > On Thu, 2016-09-22 at 15:56 -0700, Shaohua Li wrote: >> >> On Wed, Sep 07, 2016 at 09:45:59AM -0700, Huang, Ying wrote: >> >> &g

Re: [PATCH -v3 00/10] THP swap: Delay splitting THP during swapping out

2016-09-22 Thread Huang, Ying
Rik van Riel writes: > On Thu, 2016-09-22 at 15:56 -0700, Shaohua Li wrote: >> On Wed, Sep 07, 2016 at 09:45:59AM -0700, Huang, Ying wrote: >> >  >> > - It will help the memory fragmentation, especially when the THP is >> >   heavily used by the applica

Re: [PATCH -v3 00/10] THP swap: Delay splitting THP during swapping out

2016-09-22 Thread Huang, Ying
Hi, Shaohua, Thanks for comments! Shaohua Li writes: > On Wed, Sep 07, 2016 at 09:45:59AM -0700, Huang, Ying wrote: >> >> The advantages of the THP swap support include: Sorry for confusing. This is the advantages of the final goal, that is, avoid splitting/collapsing the

Re: [PATCH -v3 00/10] THP swap: Delay splitting THP during swapping out

2016-09-19 Thread Huang, Ying
Minchan Kim writes: > Hi Huang, > > On Tue, Sep 20, 2016 at 10:54:35AM +0800, Huang, Ying wrote: >> Hi, Minchan, >> >> Minchan Kim writes: >> > Hi Huang, >> > >> > On Sun, Sep 18, 2016 at 09:53:39AM +0800, Huang, Ying wrote: >> >&

Re: [PATCH -v3 00/10] THP swap: Delay splitting THP during swapping out

2016-09-19 Thread Huang, Ying
Hi, Minchan, Minchan Kim writes: > Hi Huang, > > On Sun, Sep 18, 2016 at 09:53:39AM +0800, Huang, Ying wrote: >> Minchan Kim writes: >> >> > On Tue, Sep 13, 2016 at 04:53:49PM +0800, Huang, Ying wrote: >> >> Minchan Kim writes: >> >> &

Re: [PATCH -v3 01/10] mm, swap: Make swap cluster size same of THP size on x86_64

2016-09-19 Thread Huang, Ying
Hi, Johannes, Johannes Weiner writes: > On Thu, Sep 08, 2016 at 11:15:52AM +0530, Anshuman Khandual wrote: >> On 09/07/2016 10:16 PM, Huang, Ying wrote: >> > From: Huang Ying >> > >> > In this patch, the size of the swap cluster is changed to that of the

Re: [PATCH -v3 00/10] THP swap: Delay splitting THP during swapping out

2016-09-17 Thread Huang, Ying
Minchan Kim writes: > On Tue, Sep 13, 2016 at 04:53:49PM +0800, Huang, Ying wrote: >> Minchan Kim writes: >> > On Tue, Sep 13, 2016 at 02:40:00PM +0800, Huang, Ying wrote: >> >> Minchan Kim writes: >> >> >> >> > Hi Huang, >> >

Re: [PATCH -v3 00/10] THP swap: Delay splitting THP during swapping out

2016-09-13 Thread Huang, Ying
Minchan Kim writes: > On Tue, Sep 13, 2016 at 02:40:00PM +0800, Huang, Ying wrote: >> Minchan Kim writes: >> >> > Hi Huang, >> > >> > On Fri, Sep 09, 2016 at 01:35:12PM -0700, Huang, Ying wrote: >> > >> > < snip > >> >

Re: [PATCH -v3 00/10] THP swap: Delay splitting THP during swapping out

2016-09-12 Thread Huang, Ying
Minchan Kim writes: > Hi Huang, > > On Fri, Sep 09, 2016 at 01:35:12PM -0700, Huang, Ying wrote: > > < snip > > >> >> Recently, the performance of the storage devices improved so fast that >> >> we cannot saturate the disk bandwidth when do page

Re: [PATCH -v3 00/10] THP swap: Delay splitting THP during swapping out

2016-09-09 Thread Huang, Ying
Hi, Minchan, Minchan Kim writes: > Hi Huang, > > On Wed, Sep 07, 2016 at 09:45:59AM -0700, Huang, Ying wrote: >> From: Huang Ying >> >> This patchset is to optimize the performance of Transparent Huge Page >> (THP) swap. >> >> Hi, Andrew, could yo

Re: [RFC PATCH 0/4] Reduce tree_lock contention during swap and reclaim of a single file v1

2016-09-09 Thread Huang, Ying
he future if we optimize its performance to catch up with the performance of the storage. >> So I think this series is one of those "we need to find that it makes >> a big positive impact" to make sense. >> > > Agreed. I don't mind leaving it on the back burner unless Dave reports > it really helps or a new bug report about realistic tree_lock contention > shows up. Best Regards, Huang, Ying

Re: [PATCH -v3 03/10] mm, memcg: Support to charge/uncharge multiple swap entries

2016-09-08 Thread Huang, Ying
Anshuman Khandual writes: > On 09/07/2016 10:16 PM, Huang, Ying wrote: >> From: Huang Ying >> >> This patch make it possible to charge or uncharge a set of continuous >> swap entries in the swap cgroup. The number of swap entries is >> specified via an add

Re: [PATCH -v3 04/10] mm, THP, swap: Add swap cluster allocate/free functions

2016-09-08 Thread Huang, Ying
Anshuman Khandual writes: > On 09/07/2016 10:16 PM, Huang, Ying wrote: >> From: Huang Ying >> >> The swap cluster allocation/free functions are added based on the >> existing swap cluster management mechanism for SSD. These functions >> don't work f

Re: [PATCH -v3 07/10] mm, THP, swap: Support to add/delete THP to/from swap cache

2016-09-08 Thread Huang, Ying
Hi, Anshuman, Thanks for comments! Anshuman Khandual writes: > On 09/07/2016 10:16 PM, Huang, Ying wrote: >> From: Huang Ying >> >> With this patch, a THP (Transparent Huge Page) can be added/deleted >> to/from the swap cache as a set of sub-pages (512 on x86_64

Re: [PATCH -v3 01/10] mm, swap: Make swap cluster size same of THP size on x86_64

2016-09-08 Thread Huang, Ying
Anshuman Khandual writes: > On 09/07/2016 10:16 PM, Huang, Ying wrote: >> From: Huang Ying >> >> In this patch, the size of the swap cluster is changed to that of the >> THP (Transparent Huge Page) on x86_64 architecture (512). This is for >> the THP swap s

Re: [PATCH -v3 01/10] mm, swap: Make swap cluster size same of THP size on x86_64

2016-09-08 Thread Huang, Ying
"Kirill A. Shutemov" writes: > On Wed, Sep 07, 2016 at 09:46:00AM -0700, Huang, Ying wrote: >> From: Huang Ying >> >> In this patch, the size of the swap cluster is changed to that of the >> THP (Transparent Huge Page) on x86_64 architecture (512). T

Re: [PATCH -v3 01/10] mm, swap: Make swap cluster size same of THP size on x86_64

2016-09-08 Thread Huang, Ying
"Kirill A. Shutemov" writes: > On Wed, Sep 07, 2016 at 09:46:00AM -0700, Huang, Ying wrote: >> From: Huang Ying >> >> In this patch, the size of the swap cluster is changed to that of the >> THP (Transparent Huge Page) on x86_64 architecture (512). T

Re: [PATCH -v3 05/10] mm, THP, swap: Add get_huge_swap_page()

2016-09-08 Thread Huang, Ying
"Kirill A. Shutemov" writes: > On Wed, Sep 07, 2016 at 09:46:04AM -0700, Huang, Ying wrote: >> From: Huang Ying >> >> A variation of get_swap_page(), get_huge_swap_page(), is added to >> allocate a swap cluster (512 swap slots) based on the swap cluster &

Re: [PATCH -v3 08/10] mm, THP: Add can_split_huge_page()

2016-09-08 Thread Huang, Ying
Hi, Kirill, Thanks for your comments! "Kirill A. Shutemov" writes: > On Wed, Sep 07, 2016 at 09:46:07AM -0700, Huang, Ying wrote: >> From: Huang Ying >> >> Separates checking whether we can split the huge page from >> split_huge_page_to_list() into a fu

[PATCH 2/2] mm: Remove page_file_index

2016-09-07 Thread Huang, Ying
From: Huang Ying After using the offset of the swap entry as the key of the swap cache, the page_index() becomes exactly same as page_file_index(). So the page_file_index() is removed and the callers are changed to use page_index() instead. Cc: Trond Myklebust Cc: Anna Schumaker Cc: "K

[PATCH 1/2] mm, swap: Use offset of swap entry as key of swap cache

2016-09-07 Thread Huang, Ying
From: Huang Ying This patch is to improve the performance of swap cache operations when the type of the swap device is not 0. Originally, the whole swap entry value is used as the key of the swap cache, even though there is one radix tree for each swap device. If the type of the swap device is

[PATCH -v3 01/10] mm, swap: Make swap cluster size same of THP size on x86_64

2016-09-07 Thread Huang, Ying
From: Huang Ying In this patch, the size of the swap cluster is changed to that of the THP (Transparent Huge Page) on x86_64 architecture (512). This is for the THP swap support on x86_64. Where one swap cluster will be used to hold the contents of each THP swapped out. And some information

[PATCH -v3 02/10] mm, memcg: Add swap_cgroup_iter iterator

2016-09-07 Thread Huang, Ying
From: Huang Ying The swap cgroup uses a kind of discontinuous array to record the information for the swap entries. lookup_swap_cgroup() provides a good encapsulation to access one element of the discontinuous array. To make it easier to access multiple elements of the discontinuous array, an

[PATCH -v3 05/10] mm, THP, swap: Add get_huge_swap_page()

2016-09-07 Thread Huang, Ying
From: Huang Ying A variation of get_swap_page(), get_huge_swap_page(), is added to allocate a swap cluster (512 swap slots) based on the swap cluster allocation function. A fair simple algorithm is used, that is, only the first swap device in priority list will be tried to allocate the swap

[PATCH -v3 04/10] mm, THP, swap: Add swap cluster allocate/free functions

2016-09-07 Thread Huang, Ying
From: Huang Ying The swap cluster allocation/free functions are added based on the existing swap cluster management mechanism for SSD. These functions don't work for the rotating hard disks because the existing swap cluster management mechanism doesn't work for them. The hard disks s

[PATCH -v3 07/10] mm, THP, swap: Support to add/delete THP to/from swap cache

2016-09-07 Thread Huang, Ying
From: Huang Ying With this patch, a THP (Transparent Huge Page) can be added/deleted to/from the swap cache as a set of sub-pages (512 on x86_64). This will be used for the THP (Transparent Huge Page) swap support. Where one THP may be added/delted to/from the swap cache. This will batch the

[PATCH -v3 09/10] mm, THP, swap: Support to split THP in swap cache

2016-09-07 Thread Huang, Ying
From: Huang Ying This patch enhanced the split_huge_page_to_list() to work properly for the THP (Transparent Huge Page) in the swap cache during swapping out. This is used for delaying splitting the THP during swapping out. Where for a THP to be swapped out, we will allocate a swap cluster

[PATCH -v3 03/10] mm, memcg: Support to charge/uncharge multiple swap entries

2016-09-07 Thread Huang, Ying
From: Huang Ying This patch make it possible to charge or uncharge a set of continuous swap entries in the swap cgroup. The number of swap entries is specified via an added parameter. This will be used for the THP (Transparent Huge Page) swap support. Where a swap cluster backing a THP may be

[PATCH -v3 10/10] mm, THP, swap: Delay splitting THP during swap out

2016-09-07 Thread Huang, Ying
From: Huang Ying In this patch, splitting huge page is delayed from almost the first step of swapping out to after allocating the swap space for the THP (Transparent Huge Page) and adding the THP into the swap cache. This will reduce lock acquiring/releasing for the locks used for the swap cache

[PATCH -v3 08/10] mm, THP: Add can_split_huge_page()

2016-09-07 Thread Huang, Ying
From: Huang Ying Separates checking whether we can split the huge page from split_huge_page_to_list() into a function. This will help to check that before splitting the THP (Transparent Huge Page) really. This will be used for delaying splitting THP during swapping out. Where for a THP, we

[PATCH -v3 00/10] THP swap: Delay splitting THP during swapping out

2016-09-07 Thread Huang, Ying
From: Huang Ying This patchset is to optimize the performance of Transparent Huge Page (THP) swap. Hi, Andrew, could you help me to check whether the overall design is reasonable? Hi, Hugh, Shaohua, Minchan and Rik, could you help me to review the swap part of the patchset? Especially [01/10

[PATCH -v3 06/10] mm, THP, swap: Support to clear SWAP_HAS_CACHE for huge page

2016-09-07 Thread Huang, Ying
From: Huang Ying __swapcache_free() is added to support to clear the SWAP_HAS_CACHE flag for the huge page. This will free the specified swap cluster now. Because now this function will be called only in the error path to free the swap cluster just allocated. So the corresponding swap_map[i

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-09-06 Thread Huang, Ying
k considers > itself to be rotational storage. It takes the paths that are optimised to > minimise seeks but it's quite slow. When tree_lock contention is reduced, > workload is dominated by scan_swap_map. It's a one-line fix and I have > a patch for it but it only really matters if ramdisk is being used as a > simulator for swapping to fast storage. We (LKP people) use drivers/nvdimm/pmem.c instead of drivers/block/brd.c as ramdisk. Which considers itself to be non-rotational storage. And we have a series to optimize other locks in the swap path too, for example batching the swap space allocating and freeing, etc. If your solution to optimize batching removing pages from the swap cache can be merged, that will help us much! Best Regards, Huang, Ying

Re: [PATCH -v2 01/10] swap: Change SWAPFILE_CLUSTER to 512

2016-09-02 Thread Huang, Ying
Andrew Morton writes: > On Thu, 01 Sep 2016 16:04:57 -0700 "Huang\, Ying" > wrote: > >> >> } >> >> >> >> -#define SWAPFILE_CLUSTER 256 >> >> +#define SWAPFILE_CLUSTER 512 >> >> #define LATENCY_LIMIT

Re: [PATCH -v2 01/10] swap: Change SWAPFILE_CLUSTER to 512

2016-09-01 Thread Huang, Ying
Andrew Morton writes: > On Thu, 1 Sep 2016 08:16:54 -0700 "Huang, Ying" wrote: > >> From: Huang Ying >> >> In this patch, the size of the swap cluster is changed to that of the >> THP (Transparent Huge Page) on x86_64 architecture (512). This is for &

[PATCH -v2 10/10] mm, THP, swap: Delay splitting THP during swap out

2016-09-01 Thread Huang, Ying
From: Huang Ying In this patch, splitting huge page is delayed from almost the first step of swapping out to after allocating the swap space for the THP (Transparent Huge Page) and adding the THP into the swap cache. This will reduce lock acquiring/releasing for the locks used for the swap cache

[PATCH -v2 00/10] THP swap: Delay splitting THP during swapping out

2016-09-01 Thread Huang, Ying
From: Huang Ying This patchset is to optimize the performance of Transparent Huge Page (THP) swap. Hi, Andrew, could you help me to check whether the overall design is reasonable? Hi, Hugh, Shaohua, Minchan and Rik, could you help me to review the swap part of the patchset? Especially [01/10

[PATCH -v2 03/10] mm, memcg: Support to charge/uncharge multiple swap entries

2016-09-01 Thread Huang, Ying
From: Huang Ying This patch make it possible to charge or uncharge a set of continuous swap entries in the swap cgroup. The number of swap entries is specified via an added parameter. This will be used for the THP (Transparent Huge Page) swap support. Where a swap cluster backing a THP may be

[PATCH -v2 05/10] mm, THP, swap: Add get_huge_swap_page()

2016-09-01 Thread Huang, Ying
From: Huang Ying A variation of get_swap_page(), get_huge_swap_page(), is added to allocate a swap cluster (512 swap slots) based on the swap cluster allocation function. A fair simple algorithm is used, that is, only the first swap device in priority list will be tried to allocate the swap

[PATCH -v2 06/10] mm, THP, swap: Support to clear SWAP_HAS_CACHE for huge page

2016-09-01 Thread Huang, Ying
From: Huang Ying __swapcache_free() is added to support to clear the SWAP_HAS_CACHE flag for the huge page. This will free the specified swap cluster now. Because now this function will be called only in the error path to free the swap cluster just allocated. So the corresponding swap_map[i

[PATCH -v2 07/10] mm, THP, swap: Support to add/delete THP to/from swap cache

2016-09-01 Thread Huang, Ying
From: Huang Ying With this patch, a THP (Transparent Huge Page) can be added/deleted to/from the swap cache as a set of sub-pages (512 on x86_64). This will be used for the THP (Transparent Huge Page) swap support. Where one THP may be added/delted to/from the swap cache. This will batch the

[PATCH -v2 08/10] mm, THP: Add can_split_huge_page()

2016-09-01 Thread Huang, Ying
From: Huang Ying Separates checking whether we can split the huge page from split_huge_page_to_list() into a function. This will help to check that before splitting the THP (Transparent Huge Page) really. This will be used for delaying splitting THP during swapping out. Where for a THP, we

[PATCH -v2 04/10] mm, THP, swap: Add swap cluster allocate/free functions

2016-09-01 Thread Huang, Ying
From: Huang Ying The swap cluster allocation/free functions are added based on the existing swap cluster management mechanism for SSD. These functions don't work for the rotating hard disks because the existing swap cluster management mechanism doesn't work for them. The hard disks s

[PATCH -v2 09/10] mm, THP, swap: Support to split THP in swap cache

2016-09-01 Thread Huang, Ying
From: Huang Ying This patch enhanced the split_huge_page_to_list() to work properly for the THP (Transparent Huge Page) in the swap cache during swapping out. This is used for delaying splitting the THP during swapping out. Where for a THP to be swapped out, we will allocate a swap cluster

[PATCH -v2 01/10] swap: Change SWAPFILE_CLUSTER to 512

2016-09-01 Thread Huang, Ying
From: Huang Ying In this patch, the size of the swap cluster is changed to that of the THP (Transparent Huge Page) on x86_64 architecture (512). This is for the THP swap support on x86_64. Where one swap cluster will be used to hold the contents of each THP swapped out. And some information

[PATCH -v2 02/10] mm, memcg: Add swap_cgroup_iter iterator

2016-09-01 Thread Huang, Ying
From: Huang Ying The swap cgroup uses a kind of discontinuous array to record the information for the swap entries. lookup_swap_cgroup() provides a good encapsulation to access one element of the discontinuous array. To make it easier to access multiple elements of the discontinuous array, an

Re: [PATCH -v2] mm: Don't use radix tree writeback tags for pages in swap cache

2016-08-31 Thread Huang, Ying
Mel Gorman writes: > On Wed, Aug 31, 2016 at 08:17:24AM -0700, Huang, Ying wrote: >> Mel Gorman writes: >> >> > On Tue, Aug 30, 2016 at 10:28:09AM -0700, Huang, Ying wrote: >> >> From: Huang Ying >> >> >> >> File pages use a set o

Re: [PATCH -v2] mm: Don't use radix tree writeback tags for pages in swap cache

2016-08-31 Thread Huang, Ying
Mel Gorman writes: > On Tue, Aug 30, 2016 at 10:28:09AM -0700, Huang, Ying wrote: >> From: Huang Ying >> >> File pages use a set of radix tree tags (DIRTY, TOWRITE, WRITEBACK, >> etc.) to accelerate finding the pages with a specific tag in the radix >> tree

[PATCH -v2] mm: Don't use radix tree writeback tags for pages in swap cache

2016-08-30 Thread Huang, Ying
From: Huang Ying File pages use a set of radix tree tags (DIRTY, TOWRITE, WRITEBACK, etc.) to accelerate finding the pages with a specific tag in the radix tree during inode writeback. But for anonymous pages in the swap cache, there is no inode writeback. So there is no need to find the pages

Re: [LKP] [lkp] [f2fs] ec795418c4: fsmark.files_per_sec -36.3% regression

2016-08-30 Thread Huang, Ying
ject/aimbench/aim-suite7/Initial%20release/s7110.tar.Z > > Thank you for the codes. > > I've run this workload on the latest f2fs and compared performance having > without the reported patch. (1TB nvme SSD, 16 cores, 16GB DRAM) > Interestingly, I could find slight performance improvement rather than > regression. :( > Not sure how to reproduce this. I think the difference lies on disk used. The ramdisk is used in the original test, but it appears that your memory is too small to setup the RAM disk for test. So it may be impossible for you to reproduce the test unless you can find more memory :) But we can help you to root cause the issue. What additional data do you want? perf-profile data before and after the patch? Best Regards, Huang, Ying

Re: [PATCH] mm: Don't use radix tree writeback tags for pages in swap cache

2016-08-29 Thread Huang, Ying
Hi, Rik, Thanks for comments! Rik van Riel writes: > On Thu, 2016-08-25 at 12:27 -0700, Huang, Ying wrote: >> File pages use a set of radix tags (DIRTY, TOWRITE, WRITEBACK, etc.) >> to >> accelerate finding the pages with a specific tag in the radix tree >> durin

[PATCH] mm: Don't use radix tree writeback tags for pages in swap cache

2016-08-25 Thread Huang, Ying
% perf-profile.cycles-pp._raw_spin_lock_irqsave.test_clear_page_writeback.end_page_writeback.page_endio.pmem_rw_page Cc: Hugh Dickins Cc: Shaohua Li Cc: Minchan Kim Cc: Rik van Riel Cc: Mel Gorman Cc: Tejun Heo Cc: Wu Fengguang Cc: Dave Hansen Signed-off-by: "Huang, Ying" --

Re: [RFC][PATCH 0/3] locking/mutex: Rewrite basic mutex

2016-08-25 Thread huang ying
Hi, Peter, Do you have a git tree branch for this patchset? We want to test it in 0day performance test. That will make it a little easier. Best Regards, Huang, Ying

[PATCH] mm, swap: Add swap_cluster_list

2016-08-24 Thread Huang, Ying
From: Huang Ying This is a code clean up patch without functionality changes. The swap_cluster_list data structure and its operations are introduced to provide some better encapsulation for the free cluster and discard cluster list operations. This avoid some code duplication, improved the

Re: [LKP] [lkp] [f2fs] ec795418c4: fsmark.files_per_sec -36.3% regression

2016-08-24 Thread huang ying
Hi, Jaegeuk, On Thu, Aug 11, 2016 at 6:22 PM, Jaegeuk Kim wrote: > On Thu, Aug 11, 2016 at 03:49:41PM -0700, Huang, Ying wrote: >> Hi, Kim, >> >> "Huang, Ying" writes: >> >> >> >> [lkp] [f2fs] 3bdad3c7ee: aim7.jobs-per-min -25.3% regression

Re: [RFC] mm: Don't use radix tree writeback tags for pages in swap cache

2016-08-24 Thread Huang, Ying
"Huang, Ying" writes: > Hi, Dave, > > Dave Hansen writes: > >> On 08/09/2016 09:17 AM, Huang, Ying wrote: >>> File pages uses a set of radix tags (DIRTY, TOWRITE, WRITEBACK) to >>> accelerate finding the pages with the specific tag in the the rad

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-24 Thread Huang, Ying
ramdisk on a Xeon E5 v3 machine, the swap out throughput improved 40.4%, from ~0.97GB/s to ~1.36GB/s. What's your plan for this patch? If it can be merged soon, that will be great! I found some issues in the original patch to work with swap cache. Below is my fixes to make it work for swap cac

Re: [RFC 00/11] THP swap: Delay splitting THP during swapping out

2016-08-23 Thread Huang, Ying
016 at 02:33:08PM -0700, Huang, Ying wrote: >> Hi, Minchan, >> >> Minchan Kim writes: >> > Anyway, I hope [1/11] should be merged regardless of the patchset because >> > I believe anyone doesn't feel comfortable with cluser_info functions. ;-) >> >

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-22 Thread Huang, Ying
Hi, Christoph, "Huang, Ying" writes: > Christoph Hellwig writes: > >> Snipping the long contest: >> >> I think there are three observations here: >> >> (1) removing the mark_page_accessed (which is the only significant >> change in the

Re: [RFC 00/11] THP swap: Delay splitting THP during swapping out

2016-08-22 Thread Huang, Ying
Hi, Minchan, Minchan Kim writes: > Anyway, I hope [1/11] should be merged regardless of the patchset because > I believe anyone doesn't feel comfortable with cluser_info functions. ;-) I want to send out 1/11 separately. Can I add your "Acked-by:" for it? Best Regards, Huang, Ying

Re: [RFC 00/11] THP swap: Delay splitting THP during swapping out

2016-08-19 Thread Huang, Ying
Minchan Kim writes: > On Thu, Aug 18, 2016 at 08:44:13PM -0700, Huang, Ying wrote: >> Minchan Kim writes: >> >> > Hi Huang, >> > >> > On Thu, Aug 18, 2016 at 10:19:32AM -0700, Huang, Ying wrote: >> >> Minchan Kim writes: >> >&

Re: [RFC 00/11] THP swap: Delay splitting THP during swapping out

2016-08-18 Thread Huang, Ying
Minchan Kim writes: > Hi Huang, > > On Thu, Aug 18, 2016 at 10:19:32AM -0700, Huang, Ying wrote: >> Minchan Kim writes: >> >> > Hi Tim, >> > >> > On Wed, Aug 17, 2016 at 10:24:56AM -0700, Tim Chen wrote: >> >> On Wed, 2016-08-17 at

Re: [RFC 00/11] THP swap: Delay splitting THP during swapping out

2016-08-18 Thread Huang, Ying
Minchan Kim writes: > Hi Tim, > > On Wed, Aug 17, 2016 at 10:24:56AM -0700, Tim Chen wrote: >> On Wed, 2016-08-17 at 14:07 +0900, Minchan Kim wrote: >> > On Tue, Aug 16, 2016 at 07:06:00PM -0700, Huang, Ying wrote: >> > > >> > > >> > &g

Re: [LKP] [lkp] [x86/hweight] 65ea11ec6a: will-it-scale.per_process_ops 9.3% improvement

2016-08-17 Thread Huang, Ying
Borislav Petkov writes: > On Wed, Aug 17, 2016 at 03:29:04PM -0700, Huang, Ying wrote: >> branch-miss-rate decreased from ~0.30% to ~0.043%. >> >> So I guess there are some code alignment change, which caused decreased >> branch miss rate. > > Hrrm, I still ca

Re: [LKP] [lkp] [x86/hweight] 65ea11ec6a: will-it-scale.per_process_ops 9.3% improvement

2016-08-17 Thread Huang, Ying
[ 0.039853905034485354, 0.0402472142423231, 0.04380682345704418, 0.04319082390667179 ], branch-miss-rate decreased from ~0.30% to ~0.043%. So I guess there are some code alignment change, which caused decreased branch miss rate. Best Regards, Huang, Ying

Re: [LKP] [lkp] 8700e3e7c4: BUG: unable to handle kernel NULL pointer dereference at 000001fc

2016-08-17 Thread Huang, Ying
/fs/KVM/initrd-vm-intel12-openwrt-i386-1 > -m 256 -smp 1 -device e1000,netdev=net0 -netdev user,id=net0 -boot > order=nc -no-reboot -watchdog i6300esb -watchdog-action debug -rtc > base=localtime -drive > file=/fs/KVM/disk0-vm-intel12-openwrt-i386-1,media=disk,if=virtio > -drive > file=/fs/KVM/disk1-vm-intel12-openwrt-i386-1,media=disk,if=virtio > -pidfile /dev/shm/kboot/pid-vm-intel12-openwrt-i386-1 -serial > file:/dev/shm/kboot/serial-vm-intel12-openwrt-i386-1 -daemonize > -display none -monitor null >> >> >> >> >> >> Thanks, >> Xiaolong >> > can you please provide more info to help reproduced this crash ? > on which operating system did this happen ? > which HCA device was the rxe device attached to ? mlx4 or mlx5 ? > thanks The test is done in virtual machine. And it failed during boot stage, so I think the root file system is not relevant. And there are no rxe device in the virtual machine. So I guess your driver init code may not run properly when compiled builtin and without the real device. Best Regards, Huang, Ying

Re: [RFC 00/11] THP swap: Delay splitting THP during swapping out

2016-08-16 Thread Huang, Ying
Hi, Kim, Minchan Kim writes: > Hello Huang, > > On Tue, Aug 09, 2016 at 09:37:42AM -0700, Huang, Ying wrote: >> From: Huang Ying >> >> This patchset is based on 8/4 head of mmotm/master. >> >> This is the first step for Transparent Huge Page (THP) s

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-15 Thread Huang, Ying
fastpath": 0.79, "perf-profile.func.cycles-pp.__might_sleep": 0.79, "perf-profile.func.cycles-pp.xfs_file_iomap_begin_delay.isra.9": 0.7, "perf-profile.func.cycles-pp.__list_del_entry": 0.7, "perf-profile.func.cycles-pp.vfs_write": 0.69, "perf-profile.func.cycles-pp.drop_buffers": 0.68, "perf-profile.func.cycles-pp.xfs_file_write_iter": 0.67, "perf-profile.func.cycles-pp.rwsem_spin_on_owner": 0.67, Best Regards, Huang, Ying

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-15 Thread Huang, Ying
Hi, Chinner, Dave Chinner writes: > On Wed, Aug 10, 2016 at 06:00:24PM -0700, Linus Torvalds wrote: >> On Wed, Aug 10, 2016 at 5:33 PM, Huang, Ying wrote: >> > >> > Here it is, >> >> Thanks. >> >> Appended is a munged "after"

Re: [LKP] [lkp] [f2fs] ec795418c4: fsmark.files_per_sec -36.3% regression

2016-08-11 Thread Huang, Ying
Hi, Kim, "Huang, Ying" writes: >> >> [lkp] [f2fs] 3bdad3c7ee: aim7.jobs-per-min -25.3% regression >> [lkp] [f2fs] b93f771286: aim7.jobs-per-min -81.2% regression >> >> In terms of the above regression, I could check that _reproduce_ procedure >> incl

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-11 Thread Huang, Ying
ay": 0.93, "perf-profile.func.cycles-pp.iomap_write_actor": 0.9, "perf-profile.func.cycles-pp.pagecache_get_page": 0.89, "perf-profile.func.cycles-pp.xfs_file_write_iter": 0.86, "perf-profile.func.cycles-pp.xfs_file_iomap_begin": 0.81, "perf-profile.func.cycles-pp.iov_iter_copy_from_user_atomic": 0.78, "perf-profile.func.cycles-pp.iomap_apply": 0.77, "perf-profile.func.cycles-pp.generic_write_end": 0.74, "perf-profile.func.cycles-pp.xfs_file_buffered_aio_write": 0.72, "perf-profile.func.cycles-pp.find_get_entry": 0.69, "perf-profile.func.cycles-pp.__vfs_write": 0.67, Best Regards, Huang, Ying

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-11 Thread Huang, Ying
stddev 17.78 ± 20% -20.9% 14.06 ± 13% -25.3% 13.28 ± 13% sched_debug.cpu.cpu_load[4].avg 20.29 ± 55% -44.5% 11.26 ± 39% -47.7% 10.61 ± 27% sched_debug.cpu.cpu_load[4].stddev 4929 ± 18% -24.8% 3704 ± 23% -4.5% 4708 ± 21% sched_debug.cpu.nr_load_updates.avg 276.50 ± 10% -4.4% 264.20 ± 7% -14.3% 237.00 ± 19% sched_debug.cpu.nr_switches.min Best Regards, Huang, Ying

Re: [PATCH] x86/irq: do not substract irq_tlb_count from irq_call_count

2016-08-11 Thread Huang, Ying
ltiple threads could queue > flush tlb to the same CPU but only one IPI will be sent. > > Since the commit enter Linux v3.19, the counting problem only shows up > from v3.19. Considering this is a behaviour change, I'm not sure if I > should add the stable tag here. > > Signed-off-by: Aaron Lu Thanks for fix. You forget to add :) Reported-by: "Huang, Ying" Best Regards, Huang, Ying

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-10 Thread Huang, Ying
Linus Torvalds writes: > On Wed, Aug 10, 2016 at 5:11 PM, Huang, Ying wrote: >> >> Here is the comparison result with perf-profile data. > > Heh. The diff is actually harder to read than just showing A/B > state.The fact that the call chain shows up as part of the symbol

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-10 Thread Huang, Ying
"Huang, Ying" writes: > Hi, Linus, > > Linus Torvalds writes: > >> On Wed, Aug 10, 2016 at 4:08 PM, Dave Chinner wrote: >>> >>> That, to me, says there's a change in lock contention behaviour in >>> the workload (which we know

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-10 Thread Huang, Ying
all that easy to make > sense of either. But comparing the before and after state might give > us clues. I have started perf-profile data collection, will send out the comparison result soon. Best Regards, Huang, Ying

Re: [RFC 00/11] THP swap: Delay splitting THP during swapping out

2016-08-09 Thread Huang, Ying
Hi, All, "Huang, Ying" writes: > From: Huang Ying > > This patchset is based on 8/4 head of mmotm/master. > > This is the first step for Transparent Huge Page (THP) swap support. > The plan is to delaying splitting THP step by step and avoid splitting > THP fina

Re: [RFC] mm: Don't use radix tree writeback tags for pages in swap cache

2016-08-09 Thread Huang, Ying
Hi, Dave, Dave Hansen writes: > On 08/09/2016 09:17 AM, Huang, Ying wrote: >> File pages uses a set of radix tags (DIRTY, TOWRITE, WRITEBACK) to >> accelerate finding the pages with the specific tag in the the radix tree >> during writing back an inode. But for anonymou

[RFC 01/11] swap: Add swap_cluster_list

2016-08-09 Thread Huang, Ying
From: Huang Ying This is a code clean up patch without functionality changes. The swap_cluster_list data structure and its operations is introduced to provide some better encapsulation for free cluster and discard cluster list operations. This avoid some code duplication, improved the code

[RFC 03/11] mm, memcg: Add swap_cgroup_iter iterator

2016-08-09 Thread Huang, Ying
From: Huang Ying Swap cgroup uses a discontinuous array to store the information for the swap entries. lookup_swap_cgroup() provides the good encapsulation to access one element of the discontinuous array. To make it easier to access multiple elements of the discontinuous array, an iterator

[RFC 02/11] swap: Change SWAPFILE_CLUSTER to 512

2016-08-09 Thread Huang, Ying
From: Huang Ying In this patch, the size of swap cluster is changed to that of THP on x86_64 (512). This is for THP (Transparent Huge Page) swap support on x86_64. Where one swap cluster will be used to hold the contents of each THP swapped out. And some information of the swapped out THP

[RFC 10/11] mm, THP, swap: Support to split THP in swap cache

2016-08-09 Thread Huang, Ying
From: Huang Ying This patch enhanced the split_huge_page_to_list() to work properly for THP (Transparent Huge Page) in swap cache during swapping out. This is used for delaying splitting THP during swapping out. Where for a THP to be swapped out, we will allocate a swap cluster, add the THP

[RFC 11/11] mm, THP, swap: Delay splitting THP during swap out

2016-08-09 Thread Huang, Ying
From: Huang Ying In this patch, the splitting huge page is delayed from almost the first step of swapping out to after allocating the swap space for THP and adding the THP into swap cache. This will reduce lock acquiring/releasing for locks used for swap space and swap cache management. This

[RFC 05/11] mm, THP, swap: Add swap cluster allocate/free functions

2016-08-09 Thread Huang, Ying
From: Huang Ying The swap cluster allocation/free functions are added based on the existing swap cluster management mechanism for SSD. These functions don't work for traditional hard disk because the existing swap cluster management mechanism doesn't work for it. The hard disk supp

[RFC 07/11] mm, THP, swap: Support to clear SWAP_HAS_CACHE for huge page

2016-08-09 Thread Huang, Ying
From: Huang Ying __swapcache_free() is added to support to clear SWAP_HAS_CACHE for huge page. This will free the specified swap cluster now. Because now this function will be called only in the error path to free the swap cluster just allocated. So the corresponding swap_map[i

[RFC 04/11] mm, memcg: Support to charge/uncharge multiple swap entries

2016-08-09 Thread Huang, Ying
From: Huang Ying This patch make it possible to charge or uncharge a set of continuous swap entries in swap cgroup. The number of swap entries is specified via an added parameter. This will be used for THP (Transparent Huge Page) swap support. Where a whole swap cluster backing a THP may be

<    7   8   9   10   11   12   13   14   15   16   >