Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-09-26 Thread Huang, Ying
Christoph Hellwig writes: > Hi Ying, > >> Any update to this regression? > > Not really. We've optimized everything we could in XFS without > dropping the architecture that we really want to move to. Now we're > waiting for some MM behavior to be fixed that this unconvered. But > in the end wi

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-09-26 Thread Christoph Hellwig
Hi Ying, > Any update to this regression? Not really. We've optimized everything we could in XFS without dropping the architecture that we really want to move to. Now we're waiting for some MM behavior to be fixed that this unconvered. But in the end will probabkly stuck with a slight regressi

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-09-25 Thread Huang, Ying
Hi, Christoph, "Huang, Ying" writes: > Hi, Christoph, > > "Huang, Ying" writes: > >> Christoph Hellwig writes: >> >>> Snipping the long contest: >>> >>> I think there are three observations here: >>> >>> (1) removing the mark_page_accessed (which is the only significant >>> change in the

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-09-06 Thread Huang, Ying
Mel Gorman writes: > On Fri, Sep 02, 2016 at 09:32:58AM +1000, Dave Chinner wrote: >> On Fri, Aug 19, 2016 at 04:08:34PM +0100, Mel Gorman wrote: >> > On Thu, Aug 18, 2016 at 05:11:11PM +1000, Dave Chinner wrote: >> > > On Thu, Aug 18, 2016 at 01:45:17AM +0100, Mel Gorman wrote: >> > > > On Wed,

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-09-06 Thread Mel Gorman
On Fri, Sep 02, 2016 at 09:32:58AM +1000, Dave Chinner wrote: > On Fri, Aug 19, 2016 at 04:08:34PM +0100, Mel Gorman wrote: > > On Thu, Aug 18, 2016 at 05:11:11PM +1000, Dave Chinner wrote: > > > On Thu, Aug 18, 2016 at 01:45:17AM +0100, Mel Gorman wrote: > > > > On Wed, Aug 17, 2016 at 04:49:07PM

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-09-01 Thread Dave Chinner
On Fri, Aug 19, 2016 at 04:08:34PM +0100, Mel Gorman wrote: > On Thu, Aug 18, 2016 at 05:11:11PM +1000, Dave Chinner wrote: > > On Thu, Aug 18, 2016 at 01:45:17AM +0100, Mel Gorman wrote: > > > On Wed, Aug 17, 2016 at 04:49:07PM +0100, Mel Gorman wrote: > > > > > Yes, we could try to batch the lock

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-25 Thread Mel Gorman
On Wed, Aug 24, 2016 at 08:40:37AM -0700, Huang, Ying wrote: > Mel Gorman writes: > > > On Wed, Aug 17, 2016 at 04:49:07PM +0100, Mel Gorman wrote: > >> > Yes, we could try to batch the locking like DaveC already suggested > >> > (ie we could move the locking to the caller, and then make > >> > s

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-24 Thread Huang, Ying
Hi, Mel, Mel Gorman writes: > On Wed, Aug 17, 2016 at 04:49:07PM +0100, Mel Gorman wrote: >> > Yes, we could try to batch the locking like DaveC already suggested >> > (ie we could move the locking to the caller, and then make >> > shrink_page_list() just try to keep the lock held for a few page

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-22 Thread Huang, Ying
Hi, Christoph, "Huang, Ying" writes: > Christoph Hellwig writes: > >> Snipping the long contest: >> >> I think there are three observations here: >> >> (1) removing the mark_page_accessed (which is the only significant >> change in the parent commit) hurts the >> aim7/1BRD_48G-xfs-d

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-20 Thread Mel Gorman
On Sat, Aug 20, 2016 at 09:48:39AM +1000, Dave Chinner wrote: > On Fri, Aug 19, 2016 at 11:49:46AM +0100, Mel Gorman wrote: > > On Thu, Aug 18, 2016 at 03:25:40PM -0700, Linus Torvalds wrote: > > > It *could* be as simple/stupid as just saying "let's allocate the page > > > cache for new pages from

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-19 Thread Linus Torvalds
On Fri, Aug 19, 2016 at 4:48 PM, Dave Chinner wrote: > > Well, it depends on the speed of the storage. The higher the speed > of the storage, the less we care about stalling on dirty pages > during reclaim Actually, that's largely true independently of the speed of the storage, I feel. On really

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-19 Thread Dave Chinner
On Fri, Aug 19, 2016 at 11:49:46AM +0100, Mel Gorman wrote: > On Thu, Aug 18, 2016 at 03:25:40PM -0700, Linus Torvalds wrote: > > It *could* be as simple/stupid as just saying "let's allocate the page > > cache for new pages from the current node" - and if the process that > > dirties pages just st

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-19 Thread Mel Gorman
On Thu, Aug 18, 2016 at 05:11:11PM +1000, Dave Chinner wrote: > On Thu, Aug 18, 2016 at 01:45:17AM +0100, Mel Gorman wrote: > > On Wed, Aug 17, 2016 at 04:49:07PM +0100, Mel Gorman wrote: > > > > Yes, we could try to batch the locking like DaveC already suggested > > > > (ie we could move the locki

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-19 Thread Mel Gorman
On Thu, Aug 18, 2016 at 03:25:40PM -0700, Linus Torvalds wrote: > >> In fact, looking at the __page_cache_alloc(), we already have that > >> "spread pages out" logic. I'm assuming Dave doesn't actually have that > >> bit set (I don't think it's the default), but I'm also envisioning > >> that maybe

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-19 Thread Michal Hocko
On Thu 18-08-16 15:25:40, Linus Torvalds wrote: [...] > So just for testing purposes, you could try changing that > > return alloc_pages(gfp, 0); > > in __page_cache_alloc() into something like > > return alloc_pages_node(cpu_to_node(raw_smp_processor_id())), gfp, 0); That would

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-18 Thread Dave Chinner
On Thu, Aug 18, 2016 at 10:55:01AM -0700, Linus Torvalds wrote: > On Thu, Aug 18, 2016 at 6:24 AM, Mel Gorman > wrote: > > On Thu, Aug 18, 2016 at 05:11:11PM +1000, Dave Chinner wrote: > >> FWIW, I just remembered about /proc/sys/vm/zone_reclaim_mode. > >> > > > > That is a terrifying "fix" for t

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-18 Thread Linus Torvalds
On Thu, Aug 18, 2016 at 2:19 PM, Dave Chinner wrote: > > For streaming or use-once IO it makes a lot of sense to restrict the > locality of the page cache. The faster the IO device, the less dirty > page buffering we need to maintain full device bandwidth. And the > larger the machine the greater

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-18 Thread Linus Torvalds
On Thu, Aug 18, 2016 at 6:24 AM, Mel Gorman wrote: > On Thu, Aug 18, 2016 at 05:11:11PM +1000, Dave Chinner wrote: >> FWIW, I just remembered about /proc/sys/vm/zone_reclaim_mode. >> > > That is a terrifying "fix" for this problem. It just happens to work > because there is no spillover to other n

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-18 Thread Mel Gorman
On Thu, Aug 18, 2016 at 05:11:11PM +1000, Dave Chinner wrote: > On Thu, Aug 18, 2016 at 01:45:17AM +0100, Mel Gorman wrote: > > On Wed, Aug 17, 2016 at 04:49:07PM +0100, Mel Gorman wrote: > > > > Yes, we could try to batch the locking like DaveC already suggested > > > > (ie we could move the locki

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-18 Thread Dave Chinner
On Thu, Aug 18, 2016 at 01:45:17AM +0100, Mel Gorman wrote: > On Wed, Aug 17, 2016 at 04:49:07PM +0100, Mel Gorman wrote: > > > Yes, we could try to batch the locking like DaveC already suggested > > > (ie we could move the locking to the caller, and then make > > > shrink_page_list() just try to k

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-17 Thread Dave Chinner
On Wed, Aug 17, 2016 at 04:49:07PM +0100, Mel Gorman wrote: > On Tue, Aug 16, 2016 at 10:47:36AM -0700, Linus Torvalds wrote: > > I've always preferred to see direct reclaim as the primary model for > > reclaim, partly in order to throttle the actual "bad" process, but > > also because "kswapd uses

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-17 Thread Mel Gorman
On Wed, Aug 17, 2016 at 04:49:07PM +0100, Mel Gorman wrote: > > Yes, we could try to batch the locking like DaveC already suggested > > (ie we could move the locking to the caller, and then make > > shrink_page_list() just try to keep the lock held for a few pages if > > the mapping doesn't change)

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-17 Thread Michal Hocko
On Wed 17-08-16 17:48:25, Michal Hocko wrote: [...] > I will try to catch up with the rest of the email thread but from a > quick glance it just feels like we are doing more more work under the > lock. Hmm, so it doesn't seem to be more work in __remove_mapping as pointed out in http://lkml.kernel

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-17 Thread Peter Zijlstra
On Mon, Aug 15, 2016 at 07:03:00AM +0200, Ingo Molnar wrote: > > * Linus Torvalds wrote: > > > Make sure you actually use "perf record -e cycles:pp" or something > > that uses PEBS to get real profiles using CPU performance counters. > > Btw., 'perf record -e cycles:pp' is the default now for m

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-17 Thread Mel Gorman
On Tue, Aug 16, 2016 at 10:47:36AM -0700, Linus Torvalds wrote: > I've always preferred to see direct reclaim as the primary model for > reclaim, partly in order to throttle the actual "bad" process, but > also because "kswapd uses lots of CPU time" is such a nasty thing to > even begin guessing ab

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-17 Thread Michal Hocko
On Tue 16-08-16 10:47:36, Linus Torvalds wrote: > Mel, > thanks for taking a look. Your theory sounds more complete than mine, > and since Dave is able to see the problem with 4.7, it would be nice > to hear about the 4.6 behavior and commit ede37713737 in particular. > > That one seems more like

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-16 Thread Linus Torvalds
On Tue, Aug 16, 2016 at 3:02 PM, Dave Chinner wrote: >> >> What does your profile show for when you actually dig into >> __remove_mapping() itself?, Looking at your flat profile, I'm assuming >> you get > > - 22.26% 0.93% [kernel] [k] __remove_mapping >- 3.86% __remove_mapping

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-16 Thread Dave Chinner
On Mon, Aug 15, 2016 at 06:51:42PM -0700, Linus Torvalds wrote: > Anyway, including the direct reclaim call paths gets > __remove_mapping() a bit higher, and _raw_spin_lock_irqsave climbs to > 0.26%. But perhaps more importlantly, looking at what __remove_mapping > actually *does* (apart from the s

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-16 Thread Linus Torvalds
Mel, thanks for taking a look. Your theory sounds more complete than mine, and since Dave is able to see the problem with 4.7, it would be nice to hear about the 4.6 behavior and commit ede37713737 in particular. That one seems more likely to affect contention than the zone/node one I found durin

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-16 Thread Mel Gorman
On Mon, Aug 15, 2016 at 04:48:36PM -0700, Linus Torvalds wrote: > On Mon, Aug 15, 2016 at 4:20 PM, Linus Torvalds > wrote: > > > > None of this code is all that new, which is annoying. This must have > > gone on forever, > > ... ooh. > > Wait, I take that back. > > We actually have some very re

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-16 Thread Fengguang Wu
On Sun, Aug 14, 2016 at 06:17:24PM +0200, Christoph Hellwig wrote: Snipping the long contest: I think there are three observations here: (1) removing the mark_page_accessed (which is the only significant change in the parent commit) hurts the aim7/1BRD_48G-xfs-disk_rr-3000-performance/

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-16 Thread Fengguang Wu
On Tue, Aug 16, 2016 at 07:22:40AM +1000, Dave Chinner wrote: On Mon, Aug 15, 2016 at 10:14:55PM +0800, Fengguang Wu wrote: Hi Christoph, On Sun, Aug 14, 2016 at 06:17:24PM +0200, Christoph Hellwig wrote: >Snipping the long contest: > >I think there are three observations here: > >(1) removing

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-15 Thread Linus Torvalds
On Mon, Aug 15, 2016 at 5:19 PM, Dave Chinner wrote: > >> None of this code is all that new, which is annoying. This must have >> gone on forever, > > Yes, it has been. Just worse than I've notice before, probably > because of all the stuff put under the tree lock in the past couple > of years. S

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-15 Thread Linus Torvalds
On Mon, Aug 15, 2016 at 5:38 PM, Dave Chinner wrote: > > Same in 4.7 (flat profile numbers climbed higher after this > snapshot was taken, as can be seen by the callgraph numbers): Ok, so it's not the zone-vs-node thing. It's just that nobody has looked at that load in recent times. Where "rece

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-15 Thread Linus Torvalds
On Mon, Aug 15, 2016 at 5:17 PM, Dave Chinner wrote: > > Read the code, Linus? I am. It's how I came up with my current pet theory. But I don't actually have enough sane numbers to make it much more than a cute pet theory. It *might* explain why you see tons of kswap time and bad lock contention

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-15 Thread Dave Chinner
On Mon, Aug 15, 2016 at 04:48:36PM -0700, Linus Torvalds wrote: > On Mon, Aug 15, 2016 at 4:20 PM, Linus Torvalds > wrote: > > > > None of this code is all that new, which is annoying. This must have > > gone on forever, > > ... ooh. > > Wait, I take that back. > > We actually have some very re

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-15 Thread Dave Chinner
On Mon, Aug 15, 2016 at 04:20:55PM -0700, Linus Torvalds wrote: > On Mon, Aug 15, 2016 at 3:42 PM, Dave Chinner wrote: > > > > 31.18% [kernel] [k] __pv_queued_spin_lock_slowpath > >9.90% [kernel] [k] copy_user_generic_string > >3.65% [kernel] [k] __raw_callee_save___pv_queued_spin_

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-15 Thread Dave Chinner
On Mon, Aug 15, 2016 at 05:15:47PM -0700, Linus Torvalds wrote: > DaveC - does the spinlock contention go away if you just go back to > 4.7? If so, I think it's the new zone thing. But it would be good to > verify - maybe it's something entirely different and it goes back much > further. Same in 4

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-15 Thread Dave Chinner
On Mon, Aug 15, 2016 at 04:01:00PM -0700, Linus Torvalds wrote: > On Mon, Aug 15, 2016 at 3:22 PM, Dave Chinner wrote: > > > > Right, but that does not make the profile data useless, > > Yes it does. Because it basically hides everything that happens inside > the lock, which is what causes the co

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-15 Thread Dave Chinner
On Mon, Aug 15, 2016 at 10:22:43AM -0700, Huang, Ying wrote: > Hi, Chinner, > > Dave Chinner writes: > > > On Wed, Aug 10, 2016 at 06:00:24PM -0700, Linus Torvalds wrote: > >> On Wed, Aug 10, 2016 at 5:33 PM, Huang, Ying wrote: > >> > > >> > Here it is, > >> > >> Thanks. > >> > >> Appended is

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-15 Thread Linus Torvalds
On Mon, Aug 15, 2016 at 4:20 PM, Linus Torvalds wrote: > > But I'll try to see what happens > on my profile, even if I can't recreate the contention itself, just > trying to see what happens inside of that region. Yeah, since I run my machines on encrypted disks, my profile shows 60% kthread, but

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-15 Thread Linus Torvalds
On Mon, Aug 15, 2016 at 4:20 PM, Linus Torvalds wrote: > > None of this code is all that new, which is annoying. This must have > gone on forever, ... ooh. Wait, I take that back. We actually have some very recent changes that I didn't even think about that went into this very merge window. In

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-15 Thread Linus Torvalds
On Mon, Aug 15, 2016 at 3:42 PM, Dave Chinner wrote: > > 31.18% [kernel] [k] __pv_queued_spin_lock_slowpath >9.90% [kernel] [k] copy_user_generic_string >3.65% [kernel] [k] __raw_callee_save___pv_queued_spin_unlock >2.62% [kernel] [k] __block_commit_write.isra.29 >2.26%

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-15 Thread Linus Torvalds
On Mon, Aug 15, 2016 at 3:22 PM, Dave Chinner wrote: > > Right, but that does not make the profile data useless, Yes it does. Because it basically hides everything that happens inside the lock, which is what causes the contention in the first place. So stop making inane and stupid arguments, Dav

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-15 Thread Dave Chinner
On Tue, Aug 16, 2016 at 08:22:11AM +1000, Dave Chinner wrote: > On Sun, Aug 14, 2016 at 10:12:20PM -0700, Linus Torvalds wrote: > > On Aug 14, 2016 10:00 PM, "Dave Chinner" wrote: > > > > > > > What does it say if you annotate that _raw_spin_unlock_irqrestore() > > function? > > > > > >

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-15 Thread Dave Chinner
On Sun, Aug 14, 2016 at 10:12:20PM -0700, Linus Torvalds wrote: > On Aug 14, 2016 10:00 PM, "Dave Chinner" wrote: > > > > > What does it say if you annotate that _raw_spin_unlock_irqrestore() > function? > > > >¿ > >¿Disassembly of section load0: > >¿ > >¿

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-15 Thread Dave Chinner
On Mon, Aug 15, 2016 at 10:14:55PM +0800, Fengguang Wu wrote: > Hi Christoph, > > On Sun, Aug 14, 2016 at 06:17:24PM +0200, Christoph Hellwig wrote: > >Snipping the long contest: > > > >I think there are three observations here: > > > >(1) removing the mark_page_accessed (which is the only signifi

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-15 Thread Huang, Ying
Christoph Hellwig writes: > Snipping the long contest: > > I think there are three observations here: > > (1) removing the mark_page_accessed (which is the only significant > change in the parent commit) hurts the > aim7/1BRD_48G-xfs-disk_rr-3000-performance/ivb44 test. > I'd sti

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-15 Thread Huang, Ying
Hi, Chinner, Dave Chinner writes: > On Wed, Aug 10, 2016 at 06:00:24PM -0700, Linus Torvalds wrote: >> On Wed, Aug 10, 2016 at 5:33 PM, Huang, Ying wrote: >> > >> > Here it is, >> >> Thanks. >> >> Appended is a munged "after" list, with the "before" values in >> parenthesis. It actually looks

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-15 Thread Fengguang Wu
Hi Christoph, On Sun, Aug 14, 2016 at 06:17:24PM +0200, Christoph Hellwig wrote: Snipping the long contest: I think there are three observations here: (1) removing the mark_page_accessed (which is the only significant change in the parent commit) hurts the aim7/1BRD_48G-xfs-disk_rr-30

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-15 Thread Fengguang Wu
On Fri, Aug 12, 2016 at 11:03:33AM -0700, Linus Torvalds wrote: On Thu, Aug 11, 2016 at 8:56 PM, Dave Chinner wrote: On Thu, Aug 11, 2016 at 07:27:52PM -0700, Linus Torvalds wrote: I don't recall having ever seen the mapping tree_lock as a contention point before, but it's not like I've tried

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-14 Thread Ingo Molnar
* Linus Torvalds wrote: > Make sure you actually use "perf record -e cycles:pp" or something > that uses PEBS to get real profiles using CPU performance counters. Btw., 'perf record -e cycles:pp' is the default now for modern versions of perf tooling (on most x86 systems) - if you do 'perf reco

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-14 Thread Dave Chinner
On Sun, Aug 14, 2016 at 07:53:40PM -0700, Linus Torvalds wrote: > On Sun, Aug 14, 2016 at 7:28 PM, Dave Chinner wrote: > >> > >> Maybe your symbol table came from a old kernel, and functions moved > >> around enough that the profile attributions ended up bogus. > > > > No, I don't think so. I don'

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-14 Thread Linus Torvalds
On Sun, Aug 14, 2016 at 7:28 PM, Dave Chinner wrote: >> >> Maybe your symbol table came from a old kernel, and functions moved >> around enough that the profile attributions ended up bogus. > > No, I don't think so. I don't install symbol tables on my test VMs, > I let /proc/kallsyms do that work

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-14 Thread Dave Chinner
On Sun, Aug 14, 2016 at 06:37:33PM -0700, Linus Torvalds wrote: > On Sun, Aug 14, 2016 at 5:48 PM, Dave Chinner wrote: > >> > >> Does this attached patch help your contention numbers? > > > > No. If anything, it makes it worse. Without the patch, I was > > measuring 36-37% in _raw_spin_unlock_irqr

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-14 Thread Linus Torvalds
On Sun, Aug 14, 2016 at 5:48 PM, Dave Chinner wrote: >> >> Does this attached patch help your contention numbers? > > No. If anything, it makes it worse. Without the patch, I was > measuring 36-37% in _raw_spin_unlock_irqrestore. With the patch, it > is 42-43%. Write throughtput is the same at ~50

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-14 Thread Dave Chinner
On Fri, Aug 12, 2016 at 11:03:33AM -0700, Linus Torvalds wrote: > On Thu, Aug 11, 2016 at 8:56 PM, Dave Chinner wrote: > > On Thu, Aug 11, 2016 at 07:27:52PM -0700, Linus Torvalds wrote: > >> > >> I don't recall having ever seen the mapping tree_lock as a contention > >> point before, but it's not

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-14 Thread Fengguang Wu
Hi Christoph, On Sun, Aug 14, 2016 at 06:17:24PM +0200, Christoph Hellwig wrote: Snipping the long contest: I think there are three observations here: (1) removing the mark_page_accessed (which is the only significant change in the parent commit) hurts the aim7/1BRD_48G-xfs-disk_rr-30

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-14 Thread Dave Chinner
On Sun, Aug 14, 2016 at 06:17:24PM +0200, Christoph Hellwig wrote: > Snipping the long contest: > > I think there are three observations here: > > (1) removing the mark_page_accessed (which is the only significant > change in the parent commit) hurts the > aim7/1BRD_48G-xfs-disk_rr-30

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-14 Thread Christoph Hellwig
Snipping the long contest: I think there are three observations here: (1) removing the mark_page_accessed (which is the only significant change in the parent commit) hurts the aim7/1BRD_48G-xfs-disk_rr-3000-performance/ivb44 test. I'd still rather stick to the filemap version and

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-14 Thread Fengguang Wu
Hi Christoph, On Sun, Aug 14, 2016 at 06:51:28AM +0800, Fengguang Wu wrote: Hi Christoph, On Sun, Aug 14, 2016 at 12:15:08AM +0200, Christoph Hellwig wrote: Hi Fengguang, feel free to try this git tree: git://git.infradead.org/users/hch/vfs.git iomap-fixes I just queued some test jobs fo

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-14 Thread Dave Chinner
On Sat, Aug 13, 2016 at 02:30:54AM +0200, Christoph Hellwig wrote: > On Fri, Aug 12, 2016 at 08:02:08PM +1000, Dave Chinner wrote: > > Which says "no change". Oh well, back to the drawing board... > > I don't see how it would change thing much - for all relevant calculations > we convert to block

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-14 Thread Fengguang Wu
Hi Christoph, On Sun, Aug 14, 2016 at 12:15:08AM +0200, Christoph Hellwig wrote: Hi Fengguang, feel free to try this git tree: git://git.infradead.org/users/hch/vfs.git iomap-fixes I just queued some test jobs for it. % queue -q vip -t ivb44 -b hch-vfs/iomap-fixes aim7-fs-1brd.yaml fs=xfs

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-14 Thread Christoph Hellwig
Hi Fengguang, feel free to try this git tree: git://git.infradead.org/users/hch/vfs.git iomap-fixes

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-14 Thread Fengguang Wu
Hi Linus, On Fri, Aug 12, 2016 at 11:03:33AM -0700, Linus Torvalds wrote: On Thu, Aug 11, 2016 at 8:56 PM, Dave Chinner wrote: On Thu, Aug 11, 2016 at 07:27:52PM -0700, Linus Torvalds wrote: I don't recall having ever seen the mapping tree_lock as a contention point before, but it's not like

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-14 Thread Fengguang Wu
Hi Christoph, On Sat, Aug 13, 2016 at 11:48:25PM +0200, Christoph Hellwig wrote: On Sat, Aug 13, 2016 at 02:30:54AM +0200, Christoph Hellwig wrote: Below is a patch I hacked up this morning to do just that. It passes xfstests, but I've not done any real benchmarking with it. If the reduced lo

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-14 Thread Christoph Hellwig
On Sat, Aug 13, 2016 at 02:30:54AM +0200, Christoph Hellwig wrote: > Below is a patch I hacked up this morning to do just that. It passes > xfstests, but I've not done any real benchmarking with it. If the > reduced lookup overhead in it doesn't help enough we'll need to some > sort of look aside

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-12 Thread Christoph Hellwig
On Fri, Aug 12, 2016 at 08:02:08PM +1000, Dave Chinner wrote: > Which says "no change". Oh well, back to the drawing board... I don't see how it would change thing much - for all relevant calculations we convert to block units first anyway. But the whole xfs_iomap_write_delay is a giant mess anyw

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-12 Thread Linus Torvalds
On Thu, Aug 11, 2016 at 8:56 PM, Dave Chinner wrote: > On Thu, Aug 11, 2016 at 07:27:52PM -0700, Linus Torvalds wrote: >> >> I don't recall having ever seen the mapping tree_lock as a contention >> point before, but it's not like I've tried that load either. So it >> might be a regression (going b

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-12 Thread Dave Chinner
On Fri, Aug 12, 2016 at 04:51:24PM +0800, Ye Xiaolong wrote: > On 08/12, Ye Xiaolong wrote: > >On 08/12, Dave Chinner wrote: > > [snip] > > >>lkp-folk: the patch I've just tested it attached below - can you > >>feed that through your test and see if it fixes the regression? > >> > > > >Hi, Dave >

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-12 Thread Ye Xiaolong
On 08/12, Ye Xiaolong wrote: >On 08/12, Dave Chinner wrote: [snip] >>lkp-folk: the patch I've just tested it attached below - can you >>feed that through your test and see if it fixes the regression? >> > >Hi, Dave > >I am verifying your fix patch in lkp environment now, will send the >result onc

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-11 Thread Ye Xiaolong
On 08/12, Dave Chinner wrote: >On Thu, Aug 11, 2016 at 10:02:39PM -0700, Linus Torvalds wrote: >> On Thu, Aug 11, 2016 at 9:16 PM, Dave Chinner wrote: >> > >> > That's why running aim7 as your "does the filesystem scale" >> > benchmark is somewhat irrelevant to scaling applications on high >> > pe

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-11 Thread Dave Chinner
On Thu, Aug 11, 2016 at 10:02:39PM -0700, Linus Torvalds wrote: > On Thu, Aug 11, 2016 at 9:16 PM, Dave Chinner wrote: > > > > That's why running aim7 as your "does the filesystem scale" > > benchmark is somewhat irrelevant to scaling applications on high > > performance systems these days > > Ye

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-11 Thread Linus Torvalds
On Thu, Aug 11, 2016 at 9:16 PM, Dave Chinner wrote: > > That's why running aim7 as your "does the filesystem scale" > benchmark is somewhat irrelevant to scaling applications on high > performance systems these days Yes, don't get me wrong - I'm not at all trying to say that AIM7 is a good bench

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-11 Thread Dave Chinner
On Thu, Aug 11, 2016 at 08:20:53PM -0700, Linus Torvalds wrote: > On Thu, Aug 11, 2016 at 7:52 PM, Christoph Hellwig wrote: > > > > I can look at that, but indeed optimizing this patch seems a bit > > stupid. > > The "write less than a full block to the end of the file" is actually > a reasonably

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-11 Thread Dave Chinner
On Thu, Aug 11, 2016 at 07:27:52PM -0700, Linus Torvalds wrote: > On Thu, Aug 11, 2016 at 5:54 PM, Dave Chinner wrote: > > > > So, removing mark_page_accessed() made the spinlock contention > > *worse*. > > > > 36.51% [kernel] [k] _raw_spin_unlock_irqrestore > >6.27% [kernel] [k] copy_us

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-11 Thread Linus Torvalds
On Thu, Aug 11, 2016 at 7:52 PM, Christoph Hellwig wrote: > > I can look at that, but indeed optimizing this patch seems a bit > stupid. The "write less than a full block to the end of the file" is actually a reasonably common case. It may not make for a great filesystem benchmark, but it also i

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-11 Thread Christoph Hellwig
On Fri, Aug 12, 2016 at 12:23:29PM +1000, Dave Chinner wrote: > Christoph, maybe there's something we can do to only trigger > speculative prealloc growth checks if the new file size crosses the end of > the currently allocated block at the EOF. That would chop out a fair > chunk of the xfs_bmapi_r

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-11 Thread Linus Torvalds
On Thu, Aug 11, 2016 at 7:23 PM, Dave Chinner wrote: > > And, as usual, that's the answer. Here's the reproducer: > > # sudo mkfs.xfs -f -m crc=0 /dev/pmem1 > # sudo mount -o noatime /dev/pmem1 /mnt/scratch > # sudo xfs_io -f -c "pwrite 0 512m -b 1" /mnt/scratch/fooey Heh. Ok, so 1 byte or 1kB at

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-11 Thread Linus Torvalds
On Thu, Aug 11, 2016 at 5:54 PM, Dave Chinner wrote: > > So, removing mark_page_accessed() made the spinlock contention > *worse*. > > 36.51% [kernel] [k] _raw_spin_unlock_irqrestore >6.27% [kernel] [k] copy_user_generic_string >3.73% [kernel] [k] _raw_spin_unlock_irq >3.55% [

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-11 Thread Dave Chinner
On Fri, Aug 12, 2016 at 10:54:42AM +1000, Dave Chinner wrote: > I'm now going to test Christoph's theory that this is an "overwrite > doing lots of block mapping" issue. More on that to follow. Ok, so going back to the profiles, I can say it's not an overwrite issue, because there is delayed alloc

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-11 Thread Dave Chinner
On Thu, Aug 11, 2016 at 11:16:12AM +1000, Dave Chinner wrote: > On Wed, Aug 10, 2016 at 05:33:20PM -0700, Huang, Ying wrote: > We need to know what is happening that is different - there's a good > chance the mapping trace events will tell us. Huang, can you get > a raw event trace from the test? >

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-11 Thread Dave Chinner
On Thu, Aug 11, 2016 at 09:55:33AM -0700, Linus Torvalds wrote: > On Thu, Aug 11, 2016 at 8:57 AM, Christoph Hellwig wrote: > > > > The one liner below (not tested yet) to simply remove it should fix that > > up. I also noticed we have a spurious pagefault_disable/enable, I > > need to dig into t

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-11 Thread Linus Torvalds
On Thu, Aug 11, 2016 at 3:16 PM, Al Viro wrote: > > Huh? The very first thing it does is > char *kaddr = kmap_atomic(page), *p = kaddr + offset; > > If _that_ does not disable pagefaults, we are very deep in shit. Right you are - it does, even with highmem disabled. Never mind, those pag

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-11 Thread Al Viro
On Thu, Aug 11, 2016 at 01:35:00PM -0700, Linus Torvalds wrote: > The thing is, iov_iter_copy_from_user_atomic() doesn't itself enforce > non-blocking user accesses, it depends on the caller blocking page > faults. Huh? The very first thing it does is char *kaddr = kmap_atomic(page), *p

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-11 Thread Christoph Hellwig
I'll need to dig into what AIM7 actually does in this benchmark, which isn't too easy as I'm on a business trip currently, but from the list below it looks like it keeps overwriting and overwriting a file that's already been allocated. This is a pretty stupid workload, but fortunately it should al

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-11 Thread Linus Torvalds
On Thu, Aug 11, 2016 at 2:16 PM, Huang, Ying wrote: > > Test result is as follow, Thanks. No change. > raw perf data: I redid my munging, with the old (good) percentages in parenthesis: intel_idle: 17.66 (16.88) copy_user_enhanced_fast_string:

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-11 Thread Huang, Ying
Christoph Hellwig writes: > On Thu, Aug 11, 2016 at 12:51:31PM -0700, Linus Torvalds wrote: >> Ok. It does seem to also reset the active file page counts back, so >> that part did seem to be related, but yeah, from a performance >> standpoint that was clearly not a major issue. >> >> Let's hope

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-11 Thread Linus Torvalds
On Thu, Aug 11, 2016 at 1:00 PM, Christoph Hellwig wrote: > > I can't really think of any reason why the pagefaul_disable() would > sіgnificantly change performance. No, you're right, we prefault the page anyway. And quite frankly, looking at it, I think the pagefault_disable/enable is actually

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-11 Thread Christoph Hellwig
On Thu, Aug 11, 2016 at 12:51:31PM -0700, Linus Torvalds wrote: > Ok. It does seem to also reset the active file page counts back, so > that part did seem to be related, but yeah, from a performance > standpoint that was clearly not a major issue. > > Let's hope Dave can figure out something based

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-11 Thread Linus Torvalds
On Thu, Aug 11, 2016 at 10:51 AM, Huang, Ying wrote: >> > > Here is the test result for the debug patch. It appears that the aim7 > score is a little better, but the regression is not recovered. Ok. It does seem to also reset the active file page counts back, so that part did seem to be related,

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-11 Thread Huang, Ying
Linus Torvalds writes: > On Thu, Aug 11, 2016 at 8:57 AM, Christoph Hellwig wrote: >> >> The one liner below (not tested yet) to simply remove it should fix that >> up. I also noticed we have a spurious pagefault_disable/enable, I >> need to dig into the history of that first, though. > > Hopef

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-11 Thread Linus Torvalds
On Thu, Aug 11, 2016 at 8:57 AM, Christoph Hellwig wrote: > > The one liner below (not tested yet) to simply remove it should fix that > up. I also noticed we have a spurious pagefault_disable/enable, I > need to dig into the history of that first, though. Hopefully the pagefault_disable/enable

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-11 Thread Christoph Hellwig
On Wed, Aug 10, 2016 at 06:00:24PM -0700, Linus Torvalds wrote: > The biggest difference is that we have "mark_page_accessed()" show up > after, and not before. There was also a lot of LRU noise in the > non-profile data. I wonder if that is the reason here: the old model > of using generic_perform

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-10 Thread Dave Chinner
On Wed, Aug 10, 2016 at 06:00:24PM -0700, Linus Torvalds wrote: > On Wed, Aug 10, 2016 at 5:33 PM, Huang, Ying wrote: > > > > Here it is, > > Thanks. > > Appended is a munged "after" list, with the "before" values in > parenthesis. It actually looks fairly similar. > > The biggest difference is

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-10 Thread Dave Chinner
On Thu, Aug 11, 2016 at 10:36:59AM +0800, Ye Xiaolong wrote: > On 08/11, Dave Chinner wrote: > >On Thu, Aug 11, 2016 at 11:16:12AM +1000, Dave Chinner wrote: > >> I need to see these events: > >> > >>xfs_file* > >>xfs_iomap* > >>xfs_get_block* > >> > >> For both kernels. An example tr

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-10 Thread Ye Xiaolong
On 08/11, Dave Chinner wrote: >On Thu, Aug 11, 2016 at 11:16:12AM +1000, Dave Chinner wrote: >> I need to see these events: >> >> xfs_file* >> xfs_iomap* >> xfs_get_block* >> >> For both kernels. An example trace from 4.8-rc1 running the command >> `xfs_io -f -c 'pwrite 0 512k -b 1

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-10 Thread Dave Chinner
On Thu, Aug 11, 2016 at 11:16:12AM +1000, Dave Chinner wrote: > I need to see these events: > > xfs_file* > xfs_iomap* > xfs_get_block* > > For both kernels. An example trace from 4.8-rc1 running the command > `xfs_io -f -c 'pwrite 0 512k -b 128k' /mnt/scratch/fooey doing an > o

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-10 Thread Dave Chinner
On Wed, Aug 10, 2016 at 05:33:20PM -0700, Huang, Ying wrote: > Linus Torvalds writes: > > > On Wed, Aug 10, 2016 at 5:11 PM, Huang, Ying wrote: > >> > >> Here is the comparison result with perf-profile data. > > > > Heh. The diff is actually harder to read than just showing A/B > > state.The fac

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-10 Thread Linus Torvalds
On Wed, Aug 10, 2016 at 5:33 PM, Huang, Ying wrote: > > Here it is, Thanks. Appended is a munged "after" list, with the "before" values in parenthesis. It actually looks fairly similar. The biggest difference is that we have "mark_page_accessed()" show up after, and not before. There was also a

  1   2   >