Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-25 Thread Mel Gorman
On Wed, Apr 24, 2013 at 03:09:13PM -0400, Jeff Moyer wrote: > Mel Gorman writes: > > >> I'll also note that even though your I/O is going all over the place > >> (D2C is pretty bad, 14ms), most of the time is spent waiting for a > >> struct request allocation or between Queue and Merge: > >> >

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-25 Thread Mel Gorman
On Wed, Apr 24, 2013 at 03:09:13PM -0400, Jeff Moyer wrote: Mel Gorman mgor...@suse.de writes: I'll also note that even though your I/O is going all over the place (D2C is pretty bad, 14ms), most of the time is spent waiting for a struct request allocation or between Queue and Merge:

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-24 Thread Jeff Moyer
Mel Gorman writes: >> I'll also note that even though your I/O is going all over the place >> (D2C is pretty bad, 14ms), most of the time is spent waiting for a >> struct request allocation or between Queue and Merge: >> >> All Devices >> >>

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-24 Thread Jeff Moyer
Mel Gorman mgor...@suse.de writes: I'll also note that even though your I/O is going all over the place (D2C is pretty bad, 14ms), most of the time is spent waiting for a struct request allocation or between Queue and Merge: All Devices

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-23 Thread Mel Gorman
On Tue, Apr 23, 2013 at 11:50:19AM -0400, Theodore Ts'o wrote: > On Tue, Apr 23, 2013 at 04:33:05PM +0100, Mel Gorman wrote: > > That's a pretty big drop but it gets bad again for the second worst stall -- > > wait_on_page_bit as a result of generic_file_buffered_write. > > > > Vanilla kernel

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-23 Thread Theodore Ts'o
On Tue, Apr 23, 2013 at 04:33:05PM +0100, Mel Gorman wrote: > That's a pretty big drop but it gets bad again for the second worst stall -- > wait_on_page_bit as a result of generic_file_buffered_write. > > Vanilla kernel 1336064 ms stalled with 109 events > Patched kernel 2338781 ms stalled

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-23 Thread Mel Gorman
On Sat, Apr 20, 2013 at 08:05:22PM -0400, Theodore Ts'o wrote: > An alternate solution which I've been playing around adds buffer_head > flags so we can indicate that a buffer contains metadata and/or should > have I/O submitted with the REQ_PRIO flag set. > I beefed up the reporting slightly

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-23 Thread Mel Gorman
On Mon, Apr 22, 2013 at 06:42:23PM -0400, Jeff Moyer wrote: > > 3. The blktrace indicates that reads can starve writes from flusher > > > >While there are people that can look at a blktrace and find problems > >like they are rain man, I'm more like an ADHD squirrel when looking at > >a

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-23 Thread Jan Kara
On Mon 22-04-13 18:42:23, Jeff Moyer wrote: > Jan, if I were to come up with a way of promoting a particular async > queue to the front of the line, where would I put such a call in the > ext4/jbd2 code to be effective? As Ted wrote the simplies might be to put his directly in __lock_buffer().

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-23 Thread Jan Kara
On Mon 22-04-13 18:42:23, Jeff Moyer wrote: Jan, if I were to come up with a way of promoting a particular async queue to the front of the line, where would I put such a call in the ext4/jbd2 code to be effective? As Ted wrote the simplies might be to put his directly in __lock_buffer().

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-23 Thread Mel Gorman
On Mon, Apr 22, 2013 at 06:42:23PM -0400, Jeff Moyer wrote: 3. The blktrace indicates that reads can starve writes from flusher While there are people that can look at a blktrace and find problems like they are rain man, I'm more like an ADHD squirrel when looking at a trace. I

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-23 Thread Mel Gorman
On Sat, Apr 20, 2013 at 08:05:22PM -0400, Theodore Ts'o wrote: An alternate solution which I've been playing around adds buffer_head flags so we can indicate that a buffer contains metadata and/or should have I/O submitted with the REQ_PRIO flag set. I beefed up the reporting slightly and

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-23 Thread Theodore Ts'o
On Tue, Apr 23, 2013 at 04:33:05PM +0100, Mel Gorman wrote: That's a pretty big drop but it gets bad again for the second worst stall -- wait_on_page_bit as a result of generic_file_buffered_write. Vanilla kernel 1336064 ms stalled with 109 events Patched kernel 2338781 ms stalled with 164

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-23 Thread Mel Gorman
On Tue, Apr 23, 2013 at 11:50:19AM -0400, Theodore Ts'o wrote: On Tue, Apr 23, 2013 at 04:33:05PM +0100, Mel Gorman wrote: That's a pretty big drop but it gets bad again for the second worst stall -- wait_on_page_bit as a result of generic_file_buffered_write. Vanilla kernel 1336064 ms

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-22 Thread Theodore Ts'o
On Mon, Apr 22, 2013 at 06:42:23PM -0400, Jeff Moyer wrote: > > Jan, if I were to come up with a way of promoting a particular async > queue to the front of the line, where would I put such a call in the > ext4/jbd2 code to be effective? Well, I thought we had discussed trying to bump a pending

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-22 Thread Jeff Moyer
Mel Gorman writes: > (Adding Jeff Moyer to the cc as I'm told he is interested in the blktrace) Thanks. I've got a few comments and corrections for you below. > TLDR: Flusher writes pages very quickly after processes dirty a buffer. Reads > starve flusher writes. [snip] > 3. The blktrace

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-22 Thread Mel Gorman
(Adding Jeff Moyer to the cc as I'm told he is interested in the blktrace) On Fri, Apr 12, 2013 at 11:19:52AM -0400, Theodore Ts'o wrote: > On Fri, Apr 12, 2013 at 02:50:42PM +1000, Dave Chinner wrote: > > > If that is the case, one possible solution that comes to mind would be > > > to mark

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-22 Thread Mel Gorman
(Adding Jeff Moyer to the cc as I'm told he is interested in the blktrace) On Fri, Apr 12, 2013 at 11:19:52AM -0400, Theodore Ts'o wrote: On Fri, Apr 12, 2013 at 02:50:42PM +1000, Dave Chinner wrote: If that is the case, one possible solution that comes to mind would be to mark

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-22 Thread Jeff Moyer
Mel Gorman mgor...@suse.de writes: (Adding Jeff Moyer to the cc as I'm told he is interested in the blktrace) Thanks. I've got a few comments and corrections for you below. TLDR: Flusher writes pages very quickly after processes dirty a buffer. Reads starve flusher writes. [snip] 3. The

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-22 Thread Theodore Ts'o
On Mon, Apr 22, 2013 at 06:42:23PM -0400, Jeff Moyer wrote: Jan, if I were to come up with a way of promoting a particular async queue to the front of the line, where would I put such a call in the ext4/jbd2 code to be effective? Well, I thought we had discussed trying to bump a pending I/O

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-20 Thread Theodore Ts'o
As an update to this thread, we brought up this issue at LSF/MM, and there is a thought that we should be able to solve this problem by having lock_buffer() check to see if the buffer is locked due to a write being queued, to have the priority of the write bumped up in the write queues to resolve

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-20 Thread Theodore Ts'o
As an update to this thread, we brought up this issue at LSF/MM, and there is a thought that we should be able to solve this problem by having lock_buffer() check to see if the buffer is locked due to a write being queued, to have the priority of the write bumped up in the write queues to resolve

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-12 Thread Dave Chinner
On Fri, Apr 12, 2013 at 11:19:52AM -0400, Theodore Ts'o wrote: > On Fri, Apr 12, 2013 at 02:50:42PM +1000, Dave Chinner wrote: > > > If that is the case, one possible solution that comes to mind would be > > > to mark buffer_heads that contain metadata with a flag, so that the > > > flusher thread

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-12 Thread Theodore Ts'o
On Fri, Apr 12, 2013 at 02:50:42PM +1000, Dave Chinner wrote: > > If that is the case, one possible solution that comes to mind would be > > to mark buffer_heads that contain metadata with a flag, so that the > > flusher thread can write them back at the same priority as reads. > > Ext4 is

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-12 Thread Tvrtko Ursulin
Hi all, On Thursday 11 April 2013 22:57:08 Theodore Ts'o wrote: > That's an interesting theory. If the workload is one which is very > heavy on reads and writes, that could explain the high latency. That > would explain why those of us who are using primarily SSD's are seeing > the problems,

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-12 Thread Mel Gorman
On Thu, Apr 11, 2013 at 10:57:08PM -0400, Theodore Ts'o wrote: > On Thu, Apr 11, 2013 at 11:33:35PM +0200, Jan Kara wrote: > > I think it might be more enlightening if Mel traced which process in > > which funclion is holding the buffer lock. I suspect we'll find out that > > the flusher thread

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-12 Thread Mel Gorman
On Thu, Apr 11, 2013 at 02:35:12PM -0400, Theodore Ts'o wrote: > On Thu, Apr 11, 2013 at 06:04:02PM +0100, Mel Gorman wrote: > > > If we're stalling on lock_buffer(), that implies that buffer was being > > > written, and for some reason it was taking a very long time to > > > complete. > > > > >

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-12 Thread Mel Gorman
On Thu, Apr 11, 2013 at 02:35:12PM -0400, Theodore Ts'o wrote: On Thu, Apr 11, 2013 at 06:04:02PM +0100, Mel Gorman wrote: If we're stalling on lock_buffer(), that implies that buffer was being written, and for some reason it was taking a very long time to complete. Yes.

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-12 Thread Mel Gorman
On Thu, Apr 11, 2013 at 10:57:08PM -0400, Theodore Ts'o wrote: On Thu, Apr 11, 2013 at 11:33:35PM +0200, Jan Kara wrote: I think it might be more enlightening if Mel traced which process in which funclion is holding the buffer lock. I suspect we'll find out that the flusher thread has

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-12 Thread Tvrtko Ursulin
Hi all, On Thursday 11 April 2013 22:57:08 Theodore Ts'o wrote: That's an interesting theory. If the workload is one which is very heavy on reads and writes, that could explain the high latency. That would explain why those of us who are using primarily SSD's are seeing the problems,

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-12 Thread Theodore Ts'o
On Fri, Apr 12, 2013 at 02:50:42PM +1000, Dave Chinner wrote: If that is the case, one possible solution that comes to mind would be to mark buffer_heads that contain metadata with a flag, so that the flusher thread can write them back at the same priority as reads. Ext4 is already using

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-12 Thread Dave Chinner
On Fri, Apr 12, 2013 at 11:19:52AM -0400, Theodore Ts'o wrote: On Fri, Apr 12, 2013 at 02:50:42PM +1000, Dave Chinner wrote: If that is the case, one possible solution that comes to mind would be to mark buffer_heads that contain metadata with a flag, so that the flusher thread can write

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-11 Thread Dave Chinner
On Thu, Apr 11, 2013 at 10:57:08PM -0400, Theodore Ts'o wrote: > On Thu, Apr 11, 2013 at 11:33:35PM +0200, Jan Kara wrote: > > I think it might be more enlightening if Mel traced which process in > > which funclion is holding the buffer lock. I suspect we'll find out that > > the flusher thread

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-11 Thread Theodore Ts'o
On Thu, Apr 11, 2013 at 11:33:35PM +0200, Jan Kara wrote: > I think it might be more enlightening if Mel traced which process in > which funclion is holding the buffer lock. I suspect we'll find out that > the flusher thread has submitted the buffer for IO as an async write and > thus it takes a

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-11 Thread Jan Kara
On Thu 11-04-13 14:35:12, Ted Tso wrote: > On Thu, Apr 11, 2013 at 06:04:02PM +0100, Mel Gorman wrote: > > > If we're stalling on lock_buffer(), that implies that buffer was being > > > written, and for some reason it was taking a very long time to > > > complete. > > > > > > > Yes. > > > > >

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-11 Thread Theodore Ts'o
On Thu, Apr 11, 2013 at 06:04:02PM +0100, Mel Gorman wrote: > > If we're stalling on lock_buffer(), that implies that buffer was being > > written, and for some reason it was taking a very long time to > > complete. > > > > Yes. > > > It might be worthwhile to put a timestamp in struct

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-11 Thread Mel Gorman
On Wed, Apr 10, 2013 at 09:12:45AM -0400, Theodore Ts'o wrote: > On Wed, Apr 10, 2013 at 11:56:08AM +0100, Mel Gorman wrote: > > During major activity there is likely to be "good" behaviour > > with stalls roughly every 30 seconds roughly corresponding to > > dirty_expire_centiseconds. As you'd

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-11 Thread Mel Gorman
On Wed, Apr 10, 2013 at 09:12:45AM -0400, Theodore Ts'o wrote: On Wed, Apr 10, 2013 at 11:56:08AM +0100, Mel Gorman wrote: During major activity there is likely to be good behaviour with stalls roughly every 30 seconds roughly corresponding to dirty_expire_centiseconds. As you'd expect, the

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-11 Thread Theodore Ts'o
On Thu, Apr 11, 2013 at 06:04:02PM +0100, Mel Gorman wrote: If we're stalling on lock_buffer(), that implies that buffer was being written, and for some reason it was taking a very long time to complete. Yes. It might be worthwhile to put a timestamp in struct dm_crypt_io, and

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-11 Thread Jan Kara
On Thu 11-04-13 14:35:12, Ted Tso wrote: On Thu, Apr 11, 2013 at 06:04:02PM +0100, Mel Gorman wrote: If we're stalling on lock_buffer(), that implies that buffer was being written, and for some reason it was taking a very long time to complete. Yes. It might be worthwhile

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-11 Thread Theodore Ts'o
On Thu, Apr 11, 2013 at 11:33:35PM +0200, Jan Kara wrote: I think it might be more enlightening if Mel traced which process in which funclion is holding the buffer lock. I suspect we'll find out that the flusher thread has submitted the buffer for IO as an async write and thus it takes a

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-11 Thread Dave Chinner
On Thu, Apr 11, 2013 at 10:57:08PM -0400, Theodore Ts'o wrote: On Thu, Apr 11, 2013 at 11:33:35PM +0200, Jan Kara wrote: I think it might be more enlightening if Mel traced which process in which funclion is holding the buffer lock. I suspect we'll find out that the flusher thread has

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-10 Thread Theodore Ts'o
On Wed, Apr 10, 2013 at 11:56:08AM +0100, Mel Gorman wrote: > During major activity there is likely to be "good" behaviour > with stalls roughly every 30 seconds roughly corresponding to > dirty_expire_centiseconds. As you'd expect, the flusher thread is stuck > when this happens. > > 237 ?

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-10 Thread Mel Gorman
On Tue, Apr 02, 2013 at 11:06:51AM -0400, Theodore Ts'o wrote: > On Tue, Apr 02, 2013 at 03:27:17PM +0100, Mel Gorman wrote: > > I'm testing a page-reclaim-related series on my laptop that is partially > > aimed at fixing long stalls when doing metadata-intensive operations on > > low memory such

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-10 Thread Mel Gorman
On Tue, Apr 02, 2013 at 11:06:51AM -0400, Theodore Ts'o wrote: On Tue, Apr 02, 2013 at 03:27:17PM +0100, Mel Gorman wrote: I'm testing a page-reclaim-related series on my laptop that is partially aimed at fixing long stalls when doing metadata-intensive operations on low memory such as a

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-10 Thread Theodore Ts'o
On Wed, Apr 10, 2013 at 11:56:08AM +0100, Mel Gorman wrote: During major activity there is likely to be good behaviour with stalls roughly every 30 seconds roughly corresponding to dirty_expire_centiseconds. As you'd expect, the flusher thread is stuck when this happens. 237 ?

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-08 Thread Theodore Ts'o
On Sun, Apr 07, 2013 at 05:59:06PM -0400, Frank Ch. Eigler wrote: > > semantic error: while resolving probe point: identifier 'kprobe' at > > /tmp/stapdjN4_l:18:7 > > source: probe kprobe.function("get_request_wait") > > ^ > > Pass 2: analysis failed. [man

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-08 Thread Frank Ch. Eigler
Hi, Mel - > > [...] git kernel developers > > should use git systemtap, as has always been the case. [...] > > At one point in the past this used to be the case but then systemtap had to > be compiled as part of automated tests across different kernel versions. It > could have been worked

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-08 Thread Mel Gorman
On Sun, Apr 07, 2013 at 05:59:06PM -0400, Frank Ch. Eigler wrote: > > Hi - > > > tytso wrote: > > > So I tried to reproduce the problem, and so I installed systemtap > > (bleeding edge, since otherwise it won't work with development > > kernel), and then rebuilt a kernel with all of the

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-08 Thread Mel Gorman
On Sun, Apr 07, 2013 at 05:59:06PM -0400, Frank Ch. Eigler wrote: Hi - tytso wrote: So I tried to reproduce the problem, and so I installed systemtap (bleeding edge, since otherwise it won't work with development kernel), and then rebuilt a kernel with all of the necessary CONFIG

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-08 Thread Frank Ch. Eigler
Hi, Mel - [...] git kernel developers should use git systemtap, as has always been the case. [...] At one point in the past this used to be the case but then systemtap had to be compiled as part of automated tests across different kernel versions. It could have been worked around in

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-08 Thread Theodore Ts'o
On Sun, Apr 07, 2013 at 05:59:06PM -0400, Frank Ch. Eigler wrote: semantic error: while resolving probe point: identifier 'kprobe' at /tmp/stapdjN4_l:18:7 source: probe kprobe.function(get_request_wait) ^ Pass 2: analysis failed. [man error::pass2]

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-07 Thread Frank Ch. Eigler
Hi - tytso wrote: > So I tried to reproduce the problem, and so I installed systemtap > (bleeding edge, since otherwise it won't work with development > kernel), and then rebuilt a kernel with all of the necessary CONFIG > options enabled: > > CONFIG_DEBUG_INFO, CONFIG_KPROBES,

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-07 Thread Frank Ch. Eigler
Hi - tytso wrote: So I tried to reproduce the problem, and so I installed systemtap (bleeding edge, since otherwise it won't work with development kernel), and then rebuilt a kernel with all of the necessary CONFIG options enabled: CONFIG_DEBUG_INFO, CONFIG_KPROBES, CONFIG_RELAY,

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-06 Thread Theodore Ts'o
On Sat, Apr 06, 2013 at 09:29:48AM +0200, Jiri Slaby wrote: > > I'm not sure, as I am using -next like for ever. But sure, there was a > kernel which didn't ahve this problem. Any chance you could try rolling back to 3.2 or 3.5 to see if you can get a starting point? Even a high-level bisection

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-06 Thread Jiri Slaby
On 04/06/2013 09:37 AM, Jiri Slaby wrote: > On 04/06/2013 09:29 AM, Jiri Slaby wrote: >> On 04/06/2013 01:16 AM, Theodore Ts'o wrote: >>> On Sat, Apr 06, 2013 at 12:18:11AM +0200, Jiri Slaby wrote: Ok, so now I'm runnning 3.9.0-rc5-next-20130404, it's not that bad, but it still sucks.

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-06 Thread Jiri Slaby
On 04/06/2013 09:29 AM, Jiri Slaby wrote: > On 04/06/2013 01:16 AM, Theodore Ts'o wrote: >> On Sat, Apr 06, 2013 at 12:18:11AM +0200, Jiri Slaby wrote: >>> Ok, so now I'm runnning 3.9.0-rc5-next-20130404, it's not that bad, but >>> it still sucks. Updating a kernel in a VM still results in "Your

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-06 Thread Jiri Slaby
On 04/06/2013 01:16 AM, Theodore Ts'o wrote: > On Sat, Apr 06, 2013 at 12:18:11AM +0200, Jiri Slaby wrote: >> Ok, so now I'm runnning 3.9.0-rc5-next-20130404, it's not that bad, but >> it still sucks. Updating a kernel in a VM still results in "Your system >> is too SLOW to play this!" by mplayer

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-06 Thread Jiri Slaby
On 04/06/2013 01:16 AM, Theodore Ts'o wrote: On Sat, Apr 06, 2013 at 12:18:11AM +0200, Jiri Slaby wrote: Ok, so now I'm runnning 3.9.0-rc5-next-20130404, it's not that bad, but it still sucks. Updating a kernel in a VM still results in Your system is too SLOW to play this! by mplayer and frame

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-06 Thread Jiri Slaby
On 04/06/2013 09:29 AM, Jiri Slaby wrote: On 04/06/2013 01:16 AM, Theodore Ts'o wrote: On Sat, Apr 06, 2013 at 12:18:11AM +0200, Jiri Slaby wrote: Ok, so now I'm runnning 3.9.0-rc5-next-20130404, it's not that bad, but it still sucks. Updating a kernel in a VM still results in Your system is

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-06 Thread Jiri Slaby
On 04/06/2013 09:37 AM, Jiri Slaby wrote: On 04/06/2013 09:29 AM, Jiri Slaby wrote: On 04/06/2013 01:16 AM, Theodore Ts'o wrote: On Sat, Apr 06, 2013 at 12:18:11AM +0200, Jiri Slaby wrote: Ok, so now I'm runnning 3.9.0-rc5-next-20130404, it's not that bad, but it still sucks. Updating a

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-06 Thread Theodore Ts'o
On Sat, Apr 06, 2013 at 09:29:48AM +0200, Jiri Slaby wrote: I'm not sure, as I am using -next like for ever. But sure, there was a kernel which didn't ahve this problem. Any chance you could try rolling back to 3.2 or 3.5 to see if you can get a starting point? Even a high-level bisection

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-05 Thread Theodore Ts'o
On Sat, Apr 06, 2013 at 12:18:11AM +0200, Jiri Slaby wrote: > Ok, so now I'm runnning 3.9.0-rc5-next-20130404, it's not that bad, but > it still sucks. Updating a kernel in a VM still results in "Your system > is too SLOW to play this!" by mplayer and frame dropping. What was the first kernel

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-05 Thread Jiri Slaby
On 04/03/2013 12:19 PM, Mel Gorman wrote: > On Tue, Apr 02, 2013 at 11:14:36AM -0400, Theodore Ts'o wrote: >> On Tue, Apr 02, 2013 at 11:06:51AM -0400, Theodore Ts'o wrote: >>> >>> Can you try 3.9-rc4 or later and see if the problem still persists? >>> There were a number of ext4 issues especially

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-05 Thread Jiri Slaby
On 04/03/2013 12:19 PM, Mel Gorman wrote: On Tue, Apr 02, 2013 at 11:14:36AM -0400, Theodore Ts'o wrote: On Tue, Apr 02, 2013 at 11:06:51AM -0400, Theodore Ts'o wrote: Can you try 3.9-rc4 or later and see if the problem still persists? There were a number of ext4 issues especially around low

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-05 Thread Theodore Ts'o
On Sat, Apr 06, 2013 at 12:18:11AM +0200, Jiri Slaby wrote: Ok, so now I'm runnning 3.9.0-rc5-next-20130404, it's not that bad, but it still sucks. Updating a kernel in a VM still results in Your system is too SLOW to play this! by mplayer and frame dropping. What was the first kernel where

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-03 Thread Mel Gorman
On Tue, Apr 02, 2013 at 07:16:13PM -0400, Theodore Ts'o wrote: > I've tried doing some quick timing, and if it is a performance > regression, it's not a recent one --- or I haven't been able to > reproduce what Mel is seeing. I tried the following commands while > booted into 3.2, 3.8, and

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-03 Thread Mel Gorman
On Wed, Apr 03, 2013 at 08:05:30AM -0400, Theodore Ts'o wrote: > On Wed, Apr 03, 2013 at 11:19:25AM +0100, Mel Gorman wrote: > > > > I'm running with -rc5 now. I have not noticed much interactivity problems > > as such but the stall detection script reported that mutt stalled for > > 20 seconds

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-03 Thread Theodore Ts'o
On Wed, Apr 03, 2013 at 11:19:25AM +0100, Mel Gorman wrote: > > I'm running with -rc5 now. I have not noticed much interactivity problems > as such but the stall detection script reported that mutt stalled for > 20 seconds opening an inbox and imapd blocked for 59 seconds doing path > lookups,

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-03 Thread Mel Gorman
On Tue, Apr 02, 2013 at 11:14:36AM -0400, Theodore Ts'o wrote: > On Tue, Apr 02, 2013 at 11:06:51AM -0400, Theodore Ts'o wrote: > > > > Can you try 3.9-rc4 or later and see if the problem still persists? > > There were a number of ext4 issues especially around low memory > > performance which

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-03 Thread Mel Gorman
On Tue, Apr 02, 2013 at 11:14:36AM -0400, Theodore Ts'o wrote: On Tue, Apr 02, 2013 at 11:06:51AM -0400, Theodore Ts'o wrote: Can you try 3.9-rc4 or later and see if the problem still persists? There were a number of ext4 issues especially around low memory performance which weren't

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-03 Thread Theodore Ts'o
On Wed, Apr 03, 2013 at 11:19:25AM +0100, Mel Gorman wrote: I'm running with -rc5 now. I have not noticed much interactivity problems as such but the stall detection script reported that mutt stalled for 20 seconds opening an inbox and imapd blocked for 59 seconds doing path lookups, imaps

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-03 Thread Mel Gorman
On Wed, Apr 03, 2013 at 08:05:30AM -0400, Theodore Ts'o wrote: On Wed, Apr 03, 2013 at 11:19:25AM +0100, Mel Gorman wrote: I'm running with -rc5 now. I have not noticed much interactivity problems as such but the stall detection script reported that mutt stalled for 20 seconds opening an

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-03 Thread Mel Gorman
On Tue, Apr 02, 2013 at 07:16:13PM -0400, Theodore Ts'o wrote: I've tried doing some quick timing, and if it is a performance regression, it's not a recent one --- or I haven't been able to reproduce what Mel is seeing. I tried the following commands while booted into 3.2, 3.8, and 3.9-rc3

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-02 Thread Theodore Ts'o
I've tried doing some quick timing, and if it is a performance regression, it's not a recent one --- or I haven't been able to reproduce what Mel is seeing. I tried the following commands while booted into 3.2, 3.8, and 3.9-rc3 kernels: time git clone ... rm .git/index ; time git reset I did

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-02 Thread Theodore Ts'o
So I tried to reproduce the problem, and so I installed systemtap (bleeding edge, since otherwise it won't work with development kernel), and then rebuilt a kernel with all of the necessary CONFIG options enabled: CONFIG_DEBUG_INFO, CONFIG_KPROBES, CONFIG_RELAY, CONFIG_DEBUG_FS,

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-02 Thread Mel Gorman
On Tue, Apr 02, 2013 at 11:03:36PM +0800, Zheng Liu wrote: > Hi Mel, > > Thanks for reporting it. > > On 04/02/2013 10:27 PM, Mel Gorman wrote: > > I'm testing a page-reclaim-related series on my laptop that is partially > > aimed at fixing long stalls when doing metadata-intensive operations on

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-02 Thread Theodore Ts'o
On Tue, Apr 02, 2013 at 11:06:51AM -0400, Theodore Ts'o wrote: > > Can you try 3.9-rc4 or later and see if the problem still persists? > There were a number of ext4 issues especially around low memory > performance which weren't resolved until -rc4. Actually, sorry, I took a closer look and I'm

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-02 Thread Theodore Ts'o
On Tue, Apr 02, 2013 at 03:27:17PM +0100, Mel Gorman wrote: > I'm testing a page-reclaim-related series on my laptop that is partially > aimed at fixing long stalls when doing metadata-intensive operations on > low memory such as a git checkout. I've been running 3.9-rc2 with the > series applied

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-02 Thread Zheng Liu
Hi Mel, Thanks for reporting it. On 04/02/2013 10:27 PM, Mel Gorman wrote: > I'm testing a page-reclaim-related series on my laptop that is partially > aimed at fixing long stalls when doing metadata-intensive operations on > low memory such as a git checkout. I've been running 3.9-rc2 with the

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-02 Thread Jiri Slaby
On 04/02/2013 04:27 PM, Mel Gorman wrote: > I'm testing a page-reclaim-related series on my laptop that is partially > aimed at fixing long stalls when doing metadata-intensive operations on > low memory such as a git checkout. I've been running 3.9-rc2 with the > series applied but found that the

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-02 Thread Jiri Slaby
On 04/02/2013 04:27 PM, Mel Gorman wrote: I'm testing a page-reclaim-related series on my laptop that is partially aimed at fixing long stalls when doing metadata-intensive operations on low memory such as a git checkout. I've been running 3.9-rc2 with the series applied but found that the

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-02 Thread Zheng Liu
Hi Mel, Thanks for reporting it. On 04/02/2013 10:27 PM, Mel Gorman wrote: I'm testing a page-reclaim-related series on my laptop that is partially aimed at fixing long stalls when doing metadata-intensive operations on low memory such as a git checkout. I've been running 3.9-rc2 with the

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-02 Thread Theodore Ts'o
On Tue, Apr 02, 2013 at 03:27:17PM +0100, Mel Gorman wrote: I'm testing a page-reclaim-related series on my laptop that is partially aimed at fixing long stalls when doing metadata-intensive operations on low memory such as a git checkout. I've been running 3.9-rc2 with the series applied but

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-02 Thread Theodore Ts'o
On Tue, Apr 02, 2013 at 11:06:51AM -0400, Theodore Ts'o wrote: Can you try 3.9-rc4 or later and see if the problem still persists? There were a number of ext4 issues especially around low memory performance which weren't resolved until -rc4. Actually, sorry, I took a closer look and I'm not

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-02 Thread Mel Gorman
On Tue, Apr 02, 2013 at 11:03:36PM +0800, Zheng Liu wrote: Hi Mel, Thanks for reporting it. On 04/02/2013 10:27 PM, Mel Gorman wrote: I'm testing a page-reclaim-related series on my laptop that is partially aimed at fixing long stalls when doing metadata-intensive operations on low

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-02 Thread Theodore Ts'o
So I tried to reproduce the problem, and so I installed systemtap (bleeding edge, since otherwise it won't work with development kernel), and then rebuilt a kernel with all of the necessary CONFIG options enabled: CONFIG_DEBUG_INFO, CONFIG_KPROBES, CONFIG_RELAY, CONFIG_DEBUG_FS,

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-02 Thread Theodore Ts'o
I've tried doing some quick timing, and if it is a performance regression, it's not a recent one --- or I haven't been able to reproduce what Mel is seeing. I tried the following commands while booted into 3.2, 3.8, and 3.9-rc3 kernels: time git clone ... rm .git/index ; time git reset I did