Re: [PATCH -tip] fix race between stop_two_cpus and stop_cpus

2013-11-01 Thread Peter Zijlstra
On Fri, Nov 01, 2013 at 01:44:24PM +, Mel Gorman wrote: > Ok, I see your point now but still wonder if this is too specialised > for what we are trying to do. Could it have been done with a read-write > semaphore with the global stop_cpus taking it for write and stop_two_cpus > taking it for

Re: [PATCH -tip] fix race between stop_two_cpus and stop_cpus

2013-11-01 Thread Rik van Riel
On 11/01/2013 10:24 AM, Peter Zijlstra wrote: > On Fri, Nov 01, 2013 at 01:44:24PM +, Mel Gorman wrote: >> Ok, I see your point now but still wonder if this is too specialised >> for what we are trying to do. Could it have been done with a read-write >> semaphore with the global stop_cpus

Re: [PATCH -tip] fix race between stop_two_cpus and stop_cpus

2013-11-01 Thread Mel Gorman
On Fri, Nov 01, 2013 at 07:36:36AM -0400, Rik van Riel wrote: > On 11/01/2013 07:08 AM, Mel Gorman wrote: > > On Thu, Oct 31, 2013 at 04:31:44PM -0400, Rik van Riel wrote: > >> There is a race between stop_two_cpus, and the global stop_cpus. > >> > > > > What was the trigger for this? I want to

Re: [PATCH -tip] fix race between stop_two_cpus and stop_cpus

2013-11-01 Thread Prarit Bhargava
On 11/01/2013 07:36 AM, Rik van Riel wrote: > On 11/01/2013 07:08 AM, Mel Gorman wrote: >> On Thu, Oct 31, 2013 at 04:31:44PM -0400, Rik van Riel wrote: >>> There is a race between stop_two_cpus, and the global stop_cpus. >>> >> >> What was the trigger for this? I want to see what was missing

Re: [PATCH -tip] fix race between stop_two_cpus and stop_cpus

2013-11-01 Thread Prarit Bhargava
On 11/01/2013 07:08 AM, Mel Gorman wrote: > On Thu, Oct 31, 2013 at 04:31:44PM -0400, Rik van Riel wrote: >> There is a race between stop_two_cpus, and the global stop_cpus. >> > > What was the trigger for this? I want to see what was missing from my own > testing. I'm going to go out on a limb

Re: [PATCH -tip] fix race between stop_two_cpus and stop_cpus

2013-11-01 Thread Rik van Riel
On 11/01/2013 07:08 AM, Mel Gorman wrote: > On Thu, Oct 31, 2013 at 04:31:44PM -0400, Rik van Riel wrote: >> There is a race between stop_two_cpus, and the global stop_cpus. >> > > What was the trigger for this? I want to see what was missing from my own > testing. I'm going to go out on a limb

Re: [PATCH -tip] fix race between stop_two_cpus and stop_cpus

2013-11-01 Thread Mel Gorman
On Thu, Oct 31, 2013 at 04:31:44PM -0400, Rik van Riel wrote: > There is a race between stop_two_cpus, and the global stop_cpus. > What was the trigger for this? I want to see what was missing from my own testing. I'm going to go out on a limb and guess that CPU hotplug was also running in the

Re: [PATCH -tip] fix race between stop_two_cpus and stop_cpus

2013-11-01 Thread Mel Gorman
On Thu, Oct 31, 2013 at 04:31:44PM -0400, Rik van Riel wrote: There is a race between stop_two_cpus, and the global stop_cpus. What was the trigger for this? I want to see what was missing from my own testing. I'm going to go out on a limb and guess that CPU hotplug was also running in the

Re: [PATCH -tip] fix race between stop_two_cpus and stop_cpus

2013-11-01 Thread Rik van Riel
On 11/01/2013 07:08 AM, Mel Gorman wrote: On Thu, Oct 31, 2013 at 04:31:44PM -0400, Rik van Riel wrote: There is a race between stop_two_cpus, and the global stop_cpus. What was the trigger for this? I want to see what was missing from my own testing. I'm going to go out on a limb and guess

Re: [PATCH -tip] fix race between stop_two_cpus and stop_cpus

2013-11-01 Thread Prarit Bhargava
On 11/01/2013 07:08 AM, Mel Gorman wrote: On Thu, Oct 31, 2013 at 04:31:44PM -0400, Rik van Riel wrote: There is a race between stop_two_cpus, and the global stop_cpus. What was the trigger for this? I want to see what was missing from my own testing. I'm going to go out on a limb and

Re: [PATCH -tip] fix race between stop_two_cpus and stop_cpus

2013-11-01 Thread Prarit Bhargava
On 11/01/2013 07:36 AM, Rik van Riel wrote: On 11/01/2013 07:08 AM, Mel Gorman wrote: On Thu, Oct 31, 2013 at 04:31:44PM -0400, Rik van Riel wrote: There is a race between stop_two_cpus, and the global stop_cpus. What was the trigger for this? I want to see what was missing from my own

Re: [PATCH -tip] fix race between stop_two_cpus and stop_cpus

2013-11-01 Thread Mel Gorman
On Fri, Nov 01, 2013 at 07:36:36AM -0400, Rik van Riel wrote: On 11/01/2013 07:08 AM, Mel Gorman wrote: On Thu, Oct 31, 2013 at 04:31:44PM -0400, Rik van Riel wrote: There is a race between stop_two_cpus, and the global stop_cpus. What was the trigger for this? I want to see what was

Re: [PATCH -tip] fix race between stop_two_cpus and stop_cpus

2013-11-01 Thread Rik van Riel
On 11/01/2013 10:24 AM, Peter Zijlstra wrote: On Fri, Nov 01, 2013 at 01:44:24PM +, Mel Gorman wrote: Ok, I see your point now but still wonder if this is too specialised for what we are trying to do. Could it have been done with a read-write semaphore with the global stop_cpus taking it

Re: [PATCH -tip] fix race between stop_two_cpus and stop_cpus

2013-11-01 Thread Peter Zijlstra
On Fri, Nov 01, 2013 at 01:44:24PM +, Mel Gorman wrote: Ok, I see your point now but still wonder if this is too specialised for what we are trying to do. Could it have been done with a read-write semaphore with the global stop_cpus taking it for write and stop_two_cpus taking it for

[PATCH -tip] fix race between stop_two_cpus and stop_cpus

2013-10-31 Thread Rik van Riel
There is a race between stop_two_cpus, and the global stop_cpus. It is possible for two CPUs to get their stopper functions queued "backwards" from one another, resulting in the stopper threads getting stuck, and the system hanging. This can happen because queuing up stoppers is not synchronized.

[PATCH -tip] fix race between stop_two_cpus and stop_cpus

2013-10-31 Thread Rik van Riel
There is a race between stop_two_cpus, and the global stop_cpus. It is possible for two CPUs to get their stopper functions queued backwards from one another, resulting in the stopper threads getting stuck, and the system hanging. This can happen because queuing up stoppers is not synchronized.