From: Huang Ying
Separates checking whether we can split the huge page from
split_huge_page_to_list() into a function. This will help to check that
before splitting the THP (Transparent Huge Page) really.
This will be used for delaying splitting THP during swapping out. Where
for a THP, we
From: Huang Ying
Johannes suggested me to use two big patches instead 9 patches. And he
feels that is easier for him to review. I am not sure whether this is
desirable for other reviewers too. So I sent out both versions for
review. If this version is preferable for more reviewers, I will
From: Huang Ying
The swap cluster allocation/free functions are added based on the
existing swap cluster management mechanism for SSD. These functions
don't work for the rotating hard disks because the existing swap cluster
management mechanism doesn't work for them. The hard disks s
From: Huang Ying
In this patch, the size of the swap cluster is changed to that of the
THP (Transparent Huge Page) on x86_64 architecture (512). This is for
the THP swap support on x86_64. Where one swap cluster will be used to
hold the contents of each THP swapped out. And some information
From: Huang Ying
A variation of get_swap_page(), get_huge_swap_page(), is added to
allocate a swap cluster (HPAGE_PMD_NR swap slots) based on the swap
cluster allocation function. A fair simple algorithm is used, that is,
only the first swap device in priority list will be tried to allocate
the
From: Huang Ying
This patch make it possible to charge or uncharge a set of continuous
swap entries in the swap cgroup. The number of swap entries is
specified via an added parameter.
This will be used for the THP (Transparent Huge Page) swap support.
Where a swap cluster backing a THP may be
From: Huang Ying
This patch enhanced the split_huge_page_to_list() to work properly for
the THP (Transparent Huge Page) in the swap cache during swapping out.
This is used for delaying splitting the THP during swapping out. Where
for a THP to be swapped out, we will allocate a swap cluster
From: Huang Ying
With this patch, a THP (Transparent Huge Page) can be added/deleted
to/from the swap cache as a set of (HPAGE_PMD_NR) sub-pages.
This will be used for the THP (Transparent Huge Page) swap support.
Where one THP may be added/delted to/from the swap cache. This will
batch the
From: Huang Ying
This patchset is to optimize the performance of Transparent Huge Page
(THP) swap.
Hi, Andrew, could you help me to check whether the overall design is
reasonable?
Hi, Hugh, Shaohua, Minchan and Rik, could you help me to review the
swap part of the patchset? Especially [01/10
; +goto new_cluster;
>> +}
>> +ci = lock_cluster(si, tmp);
>> +while (tmp < max) {
>
> In this work tmp is checked to be less than the max value.
> Semantic change hoped?
Oops! tmp should be checked to be more than the min value. Will fix
this unconvered. But
> in the end will probabkly stuck with a slight regression in this
> artificial workload.
I see. Thanks for update. Please keep me posted.
Best Regards,
Huang, Ying
Jaegeuk Kim writes:
> On Mon, Sep 26, 2016 at 02:26:06PM +0800, Huang, Ying wrote:
>> Hi, Jaegeuk,
>>
>> "Huang, Ying" writes:
>>
>> > Jaegeuk Kim writes:
>> >
>> >> Hello,
>> >>
>&g
Hi, Jaegeuk,
"Huang, Ying" writes:
> Jaegeuk Kim writes:
>
>> Hello,
>>
>> On Sat, Aug 27, 2016 at 10:13:34AM +0800, Fengguang Wu wrote:
>>> Hi Jaegeuk,
>>>
>>> > > >> > - [lkp] [f2fs] b93f771286: aim7.jobs-per-min
Hi, Christoph,
"Huang, Ying" writes:
> Hi, Christoph,
>
> "Huang, Ying" writes:
>
>> Christoph Hellwig writes:
>>
>>> Snipping the long contest:
>>>
>>> I think there are three observations here:
>>>
>>&g
Shaohua Li writes:
> On Fri, Sep 23, 2016 at 10:32:39AM +0800, Huang, Ying wrote:
>> Rik van Riel writes:
>>
>> > On Thu, 2016-09-22 at 15:56 -0700, Shaohua Li wrote:
>> >> On Wed, Sep 07, 2016 at 09:45:59AM -0700, Huang, Ying wrote:
>> >> &g
Rik van Riel writes:
> On Thu, 2016-09-22 at 15:56 -0700, Shaohua Li wrote:
>> On Wed, Sep 07, 2016 at 09:45:59AM -0700, Huang, Ying wrote:
>> >
>> > - It will help the memory fragmentation, especially when the THP is
>> > heavily used by the applica
Hi, Shaohua,
Thanks for comments!
Shaohua Li writes:
> On Wed, Sep 07, 2016 at 09:45:59AM -0700, Huang, Ying wrote:
>>
>> The advantages of the THP swap support include:
Sorry for confusing. This is the advantages of the final goal, that is,
avoid splitting/collapsing the
Minchan Kim writes:
> Hi Huang,
>
> On Tue, Sep 20, 2016 at 10:54:35AM +0800, Huang, Ying wrote:
>> Hi, Minchan,
>>
>> Minchan Kim writes:
>> > Hi Huang,
>> >
>> > On Sun, Sep 18, 2016 at 09:53:39AM +0800, Huang, Ying wrote:
>> >&
Hi, Minchan,
Minchan Kim writes:
> Hi Huang,
>
> On Sun, Sep 18, 2016 at 09:53:39AM +0800, Huang, Ying wrote:
>> Minchan Kim writes:
>>
>> > On Tue, Sep 13, 2016 at 04:53:49PM +0800, Huang, Ying wrote:
>> >> Minchan Kim writes:
>> >> &
Hi, Johannes,
Johannes Weiner writes:
> On Thu, Sep 08, 2016 at 11:15:52AM +0530, Anshuman Khandual wrote:
>> On 09/07/2016 10:16 PM, Huang, Ying wrote:
>> > From: Huang Ying
>> >
>> > In this patch, the size of the swap cluster is changed to that of the
Minchan Kim writes:
> On Tue, Sep 13, 2016 at 04:53:49PM +0800, Huang, Ying wrote:
>> Minchan Kim writes:
>> > On Tue, Sep 13, 2016 at 02:40:00PM +0800, Huang, Ying wrote:
>> >> Minchan Kim writes:
>> >>
>> >> > Hi Huang,
>> >
Minchan Kim writes:
> On Tue, Sep 13, 2016 at 02:40:00PM +0800, Huang, Ying wrote:
>> Minchan Kim writes:
>>
>> > Hi Huang,
>> >
>> > On Fri, Sep 09, 2016 at 01:35:12PM -0700, Huang, Ying wrote:
>> >
>> > < snip >
>> >
Minchan Kim writes:
> Hi Huang,
>
> On Fri, Sep 09, 2016 at 01:35:12PM -0700, Huang, Ying wrote:
>
> < snip >
>
>> >> Recently, the performance of the storage devices improved so fast that
>> >> we cannot saturate the disk bandwidth when do page
Hi, Minchan,
Minchan Kim writes:
> Hi Huang,
>
> On Wed, Sep 07, 2016 at 09:45:59AM -0700, Huang, Ying wrote:
>> From: Huang Ying
>>
>> This patchset is to optimize the performance of Transparent Huge Page
>> (THP) swap.
>>
>> Hi, Andrew, could yo
he
future if we optimize its performance to catch up with the performance
of the storage.
>> So I think this series is one of those "we need to find that it makes
>> a big positive impact" to make sense.
>>
>
> Agreed. I don't mind leaving it on the back burner unless Dave reports
> it really helps or a new bug report about realistic tree_lock contention
> shows up.
Best Regards,
Huang, Ying
Anshuman Khandual writes:
> On 09/07/2016 10:16 PM, Huang, Ying wrote:
>> From: Huang Ying
>>
>> This patch make it possible to charge or uncharge a set of continuous
>> swap entries in the swap cgroup. The number of swap entries is
>> specified via an add
Anshuman Khandual writes:
> On 09/07/2016 10:16 PM, Huang, Ying wrote:
>> From: Huang Ying
>>
>> The swap cluster allocation/free functions are added based on the
>> existing swap cluster management mechanism for SSD. These functions
>> don't work f
Hi, Anshuman,
Thanks for comments!
Anshuman Khandual writes:
> On 09/07/2016 10:16 PM, Huang, Ying wrote:
>> From: Huang Ying
>>
>> With this patch, a THP (Transparent Huge Page) can be added/deleted
>> to/from the swap cache as a set of sub-pages (512 on x86_64
Anshuman Khandual writes:
> On 09/07/2016 10:16 PM, Huang, Ying wrote:
>> From: Huang Ying
>>
>> In this patch, the size of the swap cluster is changed to that of the
>> THP (Transparent Huge Page) on x86_64 architecture (512). This is for
>> the THP swap s
"Kirill A. Shutemov" writes:
> On Wed, Sep 07, 2016 at 09:46:00AM -0700, Huang, Ying wrote:
>> From: Huang Ying
>>
>> In this patch, the size of the swap cluster is changed to that of the
>> THP (Transparent Huge Page) on x86_64 architecture (512). T
"Kirill A. Shutemov" writes:
> On Wed, Sep 07, 2016 at 09:46:00AM -0700, Huang, Ying wrote:
>> From: Huang Ying
>>
>> In this patch, the size of the swap cluster is changed to that of the
>> THP (Transparent Huge Page) on x86_64 architecture (512). T
"Kirill A. Shutemov" writes:
> On Wed, Sep 07, 2016 at 09:46:04AM -0700, Huang, Ying wrote:
>> From: Huang Ying
>>
>> A variation of get_swap_page(), get_huge_swap_page(), is added to
>> allocate a swap cluster (512 swap slots) based on the swap cluster
&
Hi, Kirill,
Thanks for your comments!
"Kirill A. Shutemov" writes:
> On Wed, Sep 07, 2016 at 09:46:07AM -0700, Huang, Ying wrote:
>> From: Huang Ying
>>
>> Separates checking whether we can split the huge page from
>> split_huge_page_to_list() into a fu
From: Huang Ying
After using the offset of the swap entry as the key of the swap cache,
the page_index() becomes exactly same as page_file_index(). So the
page_file_index() is removed and the callers are changed to use
page_index() instead.
Cc: Trond Myklebust
Cc: Anna Schumaker
Cc: "K
From: Huang Ying
This patch is to improve the performance of swap cache operations when
the type of the swap device is not 0. Originally, the whole swap entry
value is used as the key of the swap cache, even though there is one
radix tree for each swap device. If the type of the swap device is
From: Huang Ying
In this patch, the size of the swap cluster is changed to that of the
THP (Transparent Huge Page) on x86_64 architecture (512). This is for
the THP swap support on x86_64. Where one swap cluster will be used to
hold the contents of each THP swapped out. And some information
From: Huang Ying
The swap cgroup uses a kind of discontinuous array to record the
information for the swap entries. lookup_swap_cgroup() provides a good
encapsulation to access one element of the discontinuous array. To make
it easier to access multiple elements of the discontinuous array, an
From: Huang Ying
A variation of get_swap_page(), get_huge_swap_page(), is added to
allocate a swap cluster (512 swap slots) based on the swap cluster
allocation function. A fair simple algorithm is used, that is, only the
first swap device in priority list will be tried to allocate the swap
From: Huang Ying
The swap cluster allocation/free functions are added based on the
existing swap cluster management mechanism for SSD. These functions
don't work for the rotating hard disks because the existing swap cluster
management mechanism doesn't work for them. The hard disks s
From: Huang Ying
With this patch, a THP (Transparent Huge Page) can be added/deleted
to/from the swap cache as a set of sub-pages (512 on x86_64).
This will be used for the THP (Transparent Huge Page) swap support.
Where one THP may be added/delted to/from the swap cache. This will
batch the
From: Huang Ying
This patch enhanced the split_huge_page_to_list() to work properly for
the THP (Transparent Huge Page) in the swap cache during swapping out.
This is used for delaying splitting the THP during swapping out. Where
for a THP to be swapped out, we will allocate a swap cluster
From: Huang Ying
This patch make it possible to charge or uncharge a set of continuous
swap entries in the swap cgroup. The number of swap entries is
specified via an added parameter.
This will be used for the THP (Transparent Huge Page) swap support.
Where a swap cluster backing a THP may be
From: Huang Ying
In this patch, splitting huge page is delayed from almost the first step
of swapping out to after allocating the swap space for the
THP (Transparent Huge Page) and adding the THP into the swap cache.
This will reduce lock acquiring/releasing for the locks used for the
swap cache
From: Huang Ying
Separates checking whether we can split the huge page from
split_huge_page_to_list() into a function. This will help to check that
before splitting the THP (Transparent Huge Page) really.
This will be used for delaying splitting THP during swapping out. Where
for a THP, we
From: Huang Ying
This patchset is to optimize the performance of Transparent Huge Page
(THP) swap.
Hi, Andrew, could you help me to check whether the overall design is
reasonable?
Hi, Hugh, Shaohua, Minchan and Rik, could you help me to review the
swap part of the patchset? Especially [01/10
From: Huang Ying
__swapcache_free() is added to support to clear the SWAP_HAS_CACHE flag
for the huge page. This will free the specified swap cluster now.
Because now this function will be called only in the error path to free
the swap cluster just allocated. So the corresponding swap_map[i
k considers
> itself to be rotational storage. It takes the paths that are optimised to
> minimise seeks but it's quite slow. When tree_lock contention is reduced,
> workload is dominated by scan_swap_map. It's a one-line fix and I have
> a patch for it but it only really matters if ramdisk is being used as a
> simulator for swapping to fast storage.
We (LKP people) use drivers/nvdimm/pmem.c instead of drivers/block/brd.c
as ramdisk. Which considers itself to be non-rotational storage.
And we have a series to optimize other locks in the swap path too, for
example batching the swap space allocating and freeing, etc. If your
solution to optimize batching removing pages from the swap cache can be
merged, that will help us much!
Best Regards,
Huang, Ying
Andrew Morton writes:
> On Thu, 01 Sep 2016 16:04:57 -0700 "Huang\, Ying"
> wrote:
>
>> >> }
>> >>
>> >> -#define SWAPFILE_CLUSTER 256
>> >> +#define SWAPFILE_CLUSTER 512
>> >> #define LATENCY_LIMIT
Andrew Morton writes:
> On Thu, 1 Sep 2016 08:16:54 -0700 "Huang, Ying" wrote:
>
>> From: Huang Ying
>>
>> In this patch, the size of the swap cluster is changed to that of the
>> THP (Transparent Huge Page) on x86_64 architecture (512). This is for
&
From: Huang Ying
In this patch, splitting huge page is delayed from almost the first step
of swapping out to after allocating the swap space for the
THP (Transparent Huge Page) and adding the THP into the swap cache.
This will reduce lock acquiring/releasing for the locks used for the
swap cache
From: Huang Ying
This patchset is to optimize the performance of Transparent Huge Page
(THP) swap.
Hi, Andrew, could you help me to check whether the overall design is
reasonable?
Hi, Hugh, Shaohua, Minchan and Rik, could you help me to review the
swap part of the patchset? Especially [01/10
From: Huang Ying
This patch make it possible to charge or uncharge a set of continuous
swap entries in the swap cgroup. The number of swap entries is
specified via an added parameter.
This will be used for the THP (Transparent Huge Page) swap support.
Where a swap cluster backing a THP may be
From: Huang Ying
A variation of get_swap_page(), get_huge_swap_page(), is added to
allocate a swap cluster (512 swap slots) based on the swap cluster
allocation function. A fair simple algorithm is used, that is, only the
first swap device in priority list will be tried to allocate the swap
From: Huang Ying
__swapcache_free() is added to support to clear the SWAP_HAS_CACHE flag
for the huge page. This will free the specified swap cluster now.
Because now this function will be called only in the error path to free
the swap cluster just allocated. So the corresponding swap_map[i
From: Huang Ying
With this patch, a THP (Transparent Huge Page) can be added/deleted
to/from the swap cache as a set of sub-pages (512 on x86_64).
This will be used for the THP (Transparent Huge Page) swap support.
Where one THP may be added/delted to/from the swap cache. This will
batch the
From: Huang Ying
Separates checking whether we can split the huge page from
split_huge_page_to_list() into a function. This will help to check that
before splitting the THP (Transparent Huge Page) really.
This will be used for delaying splitting THP during swapping out. Where
for a THP, we
From: Huang Ying
The swap cluster allocation/free functions are added based on the
existing swap cluster management mechanism for SSD. These functions
don't work for the rotating hard disks because the existing swap cluster
management mechanism doesn't work for them. The hard disks s
From: Huang Ying
This patch enhanced the split_huge_page_to_list() to work properly for
the THP (Transparent Huge Page) in the swap cache during swapping out.
This is used for delaying splitting the THP during swapping out. Where
for a THP to be swapped out, we will allocate a swap cluster
From: Huang Ying
In this patch, the size of the swap cluster is changed to that of the
THP (Transparent Huge Page) on x86_64 architecture (512). This is for
the THP swap support on x86_64. Where one swap cluster will be used to
hold the contents of each THP swapped out. And some information
From: Huang Ying
The swap cgroup uses a kind of discontinuous array to record the
information for the swap entries. lookup_swap_cgroup() provides a good
encapsulation to access one element of the discontinuous array. To make
it easier to access multiple elements of the discontinuous array, an
Mel Gorman writes:
> On Wed, Aug 31, 2016 at 08:17:24AM -0700, Huang, Ying wrote:
>> Mel Gorman writes:
>>
>> > On Tue, Aug 30, 2016 at 10:28:09AM -0700, Huang, Ying wrote:
>> >> From: Huang Ying
>> >>
>> >> File pages use a set o
Mel Gorman writes:
> On Tue, Aug 30, 2016 at 10:28:09AM -0700, Huang, Ying wrote:
>> From: Huang Ying
>>
>> File pages use a set of radix tree tags (DIRTY, TOWRITE, WRITEBACK,
>> etc.) to accelerate finding the pages with a specific tag in the radix
>> tree
From: Huang Ying
File pages use a set of radix tree tags (DIRTY, TOWRITE, WRITEBACK,
etc.) to accelerate finding the pages with a specific tag in the radix
tree during inode writeback. But for anonymous pages in the swap
cache, there is no inode writeback. So there is no need to find the
pages
ject/aimbench/aim-suite7/Initial%20release/s7110.tar.Z
>
> Thank you for the codes.
>
> I've run this workload on the latest f2fs and compared performance having
> without the reported patch. (1TB nvme SSD, 16 cores, 16GB DRAM)
> Interestingly, I could find slight performance improvement rather than
> regression. :(
> Not sure how to reproduce this.
I think the difference lies on disk used. The ramdisk is used in the
original test, but it appears that your memory is too small to setup the
RAM disk for test. So it may be impossible for you to reproduce the
test unless you can find more memory :)
But we can help you to root cause the issue. What additional data do
you want? perf-profile data before and after the patch?
Best Regards,
Huang, Ying
Hi, Rik,
Thanks for comments!
Rik van Riel writes:
> On Thu, 2016-08-25 at 12:27 -0700, Huang, Ying wrote:
>> File pages use a set of radix tags (DIRTY, TOWRITE, WRITEBACK, etc.)
>> to
>> accelerate finding the pages with a specific tag in the radix tree
>> durin
%
perf-profile.cycles-pp._raw_spin_lock_irqsave.test_clear_page_writeback.end_page_writeback.page_endio.pmem_rw_page
Cc: Hugh Dickins
Cc: Shaohua Li
Cc: Minchan Kim
Cc: Rik van Riel
Cc: Mel Gorman
Cc: Tejun Heo
Cc: Wu Fengguang
Cc: Dave Hansen
Signed-off-by: "Huang, Ying"
--
Hi, Peter,
Do you have a git tree branch for this patchset? We want to test it in
0day performance test. That will make it a little easier.
Best Regards,
Huang, Ying
From: Huang Ying
This is a code clean up patch without functionality changes. The
swap_cluster_list data structure and its operations are introduced to
provide some better encapsulation for the free cluster and discard
cluster list operations. This avoid some code duplication, improved
the
Hi, Jaegeuk,
On Thu, Aug 11, 2016 at 6:22 PM, Jaegeuk Kim wrote:
> On Thu, Aug 11, 2016 at 03:49:41PM -0700, Huang, Ying wrote:
>> Hi, Kim,
>>
>> "Huang, Ying" writes:
>> >>
>> >> [lkp] [f2fs] 3bdad3c7ee: aim7.jobs-per-min -25.3% regression
"Huang, Ying" writes:
> Hi, Dave,
>
> Dave Hansen writes:
>
>> On 08/09/2016 09:17 AM, Huang, Ying wrote:
>>> File pages uses a set of radix tags (DIRTY, TOWRITE, WRITEBACK) to
>>> accelerate finding the pages with the specific tag in the the rad
ramdisk on a Xeon E5 v3 machine,
the swap out throughput improved 40.4%, from ~0.97GB/s to ~1.36GB/s.
What's your plan for this patch? If it can be merged soon, that will be
great!
I found some issues in the original patch to work with swap cache. Below
is my fixes to make it work for swap cac
016 at 02:33:08PM -0700, Huang, Ying wrote:
>> Hi, Minchan,
>>
>> Minchan Kim writes:
>> > Anyway, I hope [1/11] should be merged regardless of the patchset because
>> > I believe anyone doesn't feel comfortable with cluser_info functions. ;-)
>>
>
Hi, Christoph,
"Huang, Ying" writes:
> Christoph Hellwig writes:
>
>> Snipping the long contest:
>>
>> I think there are three observations here:
>>
>> (1) removing the mark_page_accessed (which is the only significant
>> change in the
Hi, Minchan,
Minchan Kim writes:
> Anyway, I hope [1/11] should be merged regardless of the patchset because
> I believe anyone doesn't feel comfortable with cluser_info functions. ;-)
I want to send out 1/11 separately. Can I add your "Acked-by:" for it?
Best Regards,
Huang, Ying
Minchan Kim writes:
> On Thu, Aug 18, 2016 at 08:44:13PM -0700, Huang, Ying wrote:
>> Minchan Kim writes:
>>
>> > Hi Huang,
>> >
>> > On Thu, Aug 18, 2016 at 10:19:32AM -0700, Huang, Ying wrote:
>> >> Minchan Kim writes:
>> >&
Minchan Kim writes:
> Hi Huang,
>
> On Thu, Aug 18, 2016 at 10:19:32AM -0700, Huang, Ying wrote:
>> Minchan Kim writes:
>>
>> > Hi Tim,
>> >
>> > On Wed, Aug 17, 2016 at 10:24:56AM -0700, Tim Chen wrote:
>> >> On Wed, 2016-08-17 at
Minchan Kim writes:
> Hi Tim,
>
> On Wed, Aug 17, 2016 at 10:24:56AM -0700, Tim Chen wrote:
>> On Wed, 2016-08-17 at 14:07 +0900, Minchan Kim wrote:
>> > On Tue, Aug 16, 2016 at 07:06:00PM -0700, Huang, Ying wrote:
>> > >
>> > >
>> > &g
Borislav Petkov writes:
> On Wed, Aug 17, 2016 at 03:29:04PM -0700, Huang, Ying wrote:
>> branch-miss-rate decreased from ~0.30% to ~0.043%.
>>
>> So I guess there are some code alignment change, which caused decreased
>> branch miss rate.
>
> Hrrm, I still ca
[
0.039853905034485354,
0.0402472142423231,
0.04380682345704418,
0.04319082390667179
],
branch-miss-rate decreased from ~0.30% to ~0.043%.
So I guess there are some code alignment change, which caused decreased
branch miss rate.
Best Regards,
Huang, Ying
/fs/KVM/initrd-vm-intel12-openwrt-i386-1
> -m 256 -smp 1 -device e1000,netdev=net0 -netdev user,id=net0 -boot
> order=nc -no-reboot -watchdog i6300esb -watchdog-action debug -rtc
> base=localtime -drive
> file=/fs/KVM/disk0-vm-intel12-openwrt-i386-1,media=disk,if=virtio
> -drive
> file=/fs/KVM/disk1-vm-intel12-openwrt-i386-1,media=disk,if=virtio
> -pidfile /dev/shm/kboot/pid-vm-intel12-openwrt-i386-1 -serial
> file:/dev/shm/kboot/serial-vm-intel12-openwrt-i386-1 -daemonize
> -display none -monitor null
>>
>>
>>
>>
>>
>> Thanks,
>> Xiaolong
>>
> can you please provide more info to help reproduced this crash ?
> on which operating system did this happen ?
> which HCA device was the rxe device attached to ? mlx4 or mlx5 ?
> thanks
The test is done in virtual machine. And it failed during boot stage,
so I think the root file system is not relevant. And there are no rxe
device in the virtual machine. So I guess your driver init code may not
run properly when compiled builtin and without the real device.
Best Regards,
Huang, Ying
Hi, Kim,
Minchan Kim writes:
> Hello Huang,
>
> On Tue, Aug 09, 2016 at 09:37:42AM -0700, Huang, Ying wrote:
>> From: Huang Ying
>>
>> This patchset is based on 8/4 head of mmotm/master.
>>
>> This is the first step for Transparent Huge Page (THP) s
fastpath": 0.79,
"perf-profile.func.cycles-pp.__might_sleep": 0.79,
"perf-profile.func.cycles-pp.xfs_file_iomap_begin_delay.isra.9": 0.7,
"perf-profile.func.cycles-pp.__list_del_entry": 0.7,
"perf-profile.func.cycles-pp.vfs_write": 0.69,
"perf-profile.func.cycles-pp.drop_buffers": 0.68,
"perf-profile.func.cycles-pp.xfs_file_write_iter": 0.67,
"perf-profile.func.cycles-pp.rwsem_spin_on_owner": 0.67,
Best Regards,
Huang, Ying
Hi, Chinner,
Dave Chinner writes:
> On Wed, Aug 10, 2016 at 06:00:24PM -0700, Linus Torvalds wrote:
>> On Wed, Aug 10, 2016 at 5:33 PM, Huang, Ying wrote:
>> >
>> > Here it is,
>>
>> Thanks.
>>
>> Appended is a munged "after"
Hi, Kim,
"Huang, Ying" writes:
>>
>> [lkp] [f2fs] 3bdad3c7ee: aim7.jobs-per-min -25.3% regression
>> [lkp] [f2fs] b93f771286: aim7.jobs-per-min -81.2% regression
>>
>> In terms of the above regression, I could check that _reproduce_ procedure
>> incl
ay": 0.93,
"perf-profile.func.cycles-pp.iomap_write_actor": 0.9,
"perf-profile.func.cycles-pp.pagecache_get_page": 0.89,
"perf-profile.func.cycles-pp.xfs_file_write_iter": 0.86,
"perf-profile.func.cycles-pp.xfs_file_iomap_begin": 0.81,
"perf-profile.func.cycles-pp.iov_iter_copy_from_user_atomic": 0.78,
"perf-profile.func.cycles-pp.iomap_apply": 0.77,
"perf-profile.func.cycles-pp.generic_write_end": 0.74,
"perf-profile.func.cycles-pp.xfs_file_buffered_aio_write": 0.72,
"perf-profile.func.cycles-pp.find_get_entry": 0.69,
"perf-profile.func.cycles-pp.__vfs_write": 0.67,
Best Regards,
Huang, Ying
stddev
17.78 ± 20% -20.9% 14.06 ± 13% -25.3% 13.28 ± 13%
sched_debug.cpu.cpu_load[4].avg
20.29 ± 55% -44.5% 11.26 ± 39% -47.7% 10.61 ± 27%
sched_debug.cpu.cpu_load[4].stddev
4929 ± 18% -24.8% 3704 ± 23% -4.5% 4708 ± 21%
sched_debug.cpu.nr_load_updates.avg
276.50 ± 10% -4.4% 264.20 ± 7% -14.3% 237.00 ± 19%
sched_debug.cpu.nr_switches.min
Best Regards,
Huang, Ying
ltiple threads could queue
> flush tlb to the same CPU but only one IPI will be sent.
>
> Since the commit enter Linux v3.19, the counting problem only shows up
> from v3.19. Considering this is a behaviour change, I'm not sure if I
> should add the stable tag here.
>
> Signed-off-by: Aaron Lu
Thanks for fix. You forget to add :)
Reported-by: "Huang, Ying"
Best Regards,
Huang, Ying
Linus Torvalds writes:
> On Wed, Aug 10, 2016 at 5:11 PM, Huang, Ying wrote:
>>
>> Here is the comparison result with perf-profile data.
>
> Heh. The diff is actually harder to read than just showing A/B
> state.The fact that the call chain shows up as part of the symbol
"Huang, Ying" writes:
> Hi, Linus,
>
> Linus Torvalds writes:
>
>> On Wed, Aug 10, 2016 at 4:08 PM, Dave Chinner wrote:
>>>
>>> That, to me, says there's a change in lock contention behaviour in
>>> the workload (which we know
all that easy to make
> sense of either. But comparing the before and after state might give
> us clues.
I have started perf-profile data collection, will send out the
comparison result soon.
Best Regards,
Huang, Ying
Hi, All,
"Huang, Ying" writes:
> From: Huang Ying
>
> This patchset is based on 8/4 head of mmotm/master.
>
> This is the first step for Transparent Huge Page (THP) swap support.
> The plan is to delaying splitting THP step by step and avoid splitting
> THP fina
Hi, Dave,
Dave Hansen writes:
> On 08/09/2016 09:17 AM, Huang, Ying wrote:
>> File pages uses a set of radix tags (DIRTY, TOWRITE, WRITEBACK) to
>> accelerate finding the pages with the specific tag in the the radix tree
>> during writing back an inode. But for anonymou
From: Huang Ying
This is a code clean up patch without functionality changes. The
swap_cluster_list data structure and its operations is introduced to
provide some better encapsulation for free cluster and discard cluster
list operations. This avoid some code duplication, improved the code
From: Huang Ying
Swap cgroup uses a discontinuous array to store the information for the
swap entries. lookup_swap_cgroup() provides the good encapsulation to
access one element of the discontinuous array. To make it easier to
access multiple elements of the discontinuous array, an iterator
From: Huang Ying
In this patch, the size of swap cluster is changed to that of THP on
x86_64 (512). This is for THP (Transparent Huge Page) swap support on
x86_64. Where one swap cluster will be used to hold the contents of
each THP swapped out. And some information of the swapped out THP
From: Huang Ying
This patch enhanced the split_huge_page_to_list() to work properly for
THP (Transparent Huge Page) in swap cache during swapping out.
This is used for delaying splitting THP during swapping out. Where for
a THP to be swapped out, we will allocate a swap cluster, add the THP
From: Huang Ying
In this patch, the splitting huge page is delayed from almost the first
step of swapping out to after allocating the swap space for THP and
adding the THP into swap cache. This will reduce lock
acquiring/releasing for locks used for swap space and swap cache
management.
This
From: Huang Ying
The swap cluster allocation/free functions are added based on the
existing swap cluster management mechanism for SSD. These functions
don't work for traditional hard disk because the existing swap cluster
management mechanism doesn't work for it. The hard disk supp
From: Huang Ying
__swapcache_free() is added to support to clear SWAP_HAS_CACHE for huge
page. This will free the specified swap cluster now. Because now this
function will be called only in the error path to free the swap cluster
just allocated. So the corresponding swap_map[i
From: Huang Ying
This patch make it possible to charge or uncharge a set of continuous
swap entries in swap cgroup. The number of swap entries is specified
via an added parameter.
This will be used for THP (Transparent Huge Page) swap support. Where a
whole swap cluster backing a THP may be
1101 - 1200 of 1854 matches
Mail list logo