Re: Xterm Hangs - Possible scheduler defect?

2005-02-26 Thread Helge Hafting
On Fri, Feb 25, 2005 at 04:02:26PM -0500, Chad N. Tindel wrote: > > What's so special about a 64-way box? > > They're expensive and customers don't expect a single userspace thread to > tie up the other 63 CPUs no matter how buggy it is. It is intuitively obvious > that a buggy kernel can bring

Re: Xterm Hangs - Possible scheduler defect?

2005-02-26 Thread Helge Hafting
On Fri, Feb 25, 2005 at 04:02:26PM -0500, Chad N. Tindel wrote: What's so special about a 64-way box? They're expensive and customers don't expect a single userspace thread to tie up the other 63 CPUs no matter how buggy it is. It is intuitively obvious that a buggy kernel can bring a

Re: Xterm Hangs - Possible scheduler defect?

2005-02-25 Thread Lee Revell
On Fri, 2005-02-25 at 16:02 -0500, Chad N. Tindel wrote: > They're expensive and customers don't expect a single userspace thread to > tie up the other 63 CPUs no matter how buggy it is. It is intuitively obvious > that a buggy kernel can bring a system to its knees, but it is not intuitively >

Re: Xterm Hangs - Possible scheduler defect?

2005-02-25 Thread Chad N. Tindel
> What's so special about a 64-way box? They're expensive and customers don't expect a single userspace thread to tie up the other 63 CPUs no matter how buggy it is. It is intuitively obvious that a buggy kernel can bring a system to its knees, but it is not intuitively obvious that a buggy

Re: Xterm Hangs - Possible scheduler defect?

2005-02-25 Thread Helge Hafting
On Thu, Feb 24, 2005 at 02:22:37PM -0500, Chad N. Tindel wrote: > > If you keep a learning attitude, there is a chance for this discussion > > to go on. However, if you keep the "Come now, don't bullshit me, this is > > a broken architecture and you're just trying to cover up" attitude, > >

Re: Xterm Hangs - Possible scheduler defect?

2005-02-25 Thread Chris Friesen
Lee Revell wrote: The solution to your problem (which is as old as the hills) involves priority inheriting mutexes which are available in the RT preempt patch (if you build with CONFIG_PREEMPT_RT). This should be usable for hard realtime applications. Yup. I was just pointing out that userspace

Re: Xterm Hangs - Possible scheduler defect?

2005-02-25 Thread Lee Revell
On Fri, 2005-02-25 at 15:53 +, Paulo Marques wrote: > Ingo Oeser wrote: > > Chris Friesen wrote: > > > >>Ingo Oeser wrote: > >>[...] > > You would need to change the priority of task 1 until it releases the > > mutex. Ideally the owner gets the maximum priority of > > his and all the waiters

Re: Xterm Hangs - Possible scheduler defect?

2005-02-25 Thread Paulo Marques
Ingo Oeser wrote: Chris Friesen wrote: Ingo Oeser wrote: [...] You would need to change the priority of task 1 until it releases the mutex. Ideally the owner gets the maximum priority of his and all the waiters on it, until it releases his mutex, where he regains its old priority after release of

Re: Xterm Hangs - Possible scheduler defect?

2005-02-25 Thread Ingo Oeser
Chris Friesen wrote: > Ingo Oeser wrote: > > Stupid applications can starve other applications for a while, but not > > forever, because the kernel is still running and deciding. > > Not so. > > > > task 1: sched_rr, priority 1, takes mutex > task 2: sched_rr, priority 2, cpu hog, infinite loop >

Re: Xterm Hangs - Possible scheduler defect?

2005-02-25 Thread Chris Friesen
Ingo Oeser wrote: Stupid applications can starve other applications for a while, but not forever, because the kernel is still running and deciding. Not so. task 1: sched_rr, priority 1, takes mutex task 2: sched_rr, priority 2, cpu hog, infinite loop task 3: sched_rr, priority 99, tries to get

Re: Xterm Hangs - Possible scheduler defect?

2005-02-25 Thread Chris Friesen
Ingo Oeser wrote: Stupid applications can starve other applications for a while, but not forever, because the kernel is still running and deciding. Not so. task 1: sched_rr, priority 1, takes mutex task 2: sched_rr, priority 2, cpu hog, infinite loop task 3: sched_rr, priority 99, tries to get

Re: Xterm Hangs - Possible scheduler defect?

2005-02-25 Thread Ingo Oeser
Chris Friesen wrote: Ingo Oeser wrote: Stupid applications can starve other applications for a while, but not forever, because the kernel is still running and deciding. Not so. task 1: sched_rr, priority 1, takes mutex task 2: sched_rr, priority 2, cpu hog, infinite loop task 3:

Re: Xterm Hangs - Possible scheduler defect?

2005-02-25 Thread Paulo Marques
Ingo Oeser wrote: Chris Friesen wrote: Ingo Oeser wrote: [...] You would need to change the priority of task 1 until it releases the mutex. Ideally the owner gets the maximum priority of his and all the waiters on it, until it releases his mutex, where he regains its old priority after release of

Re: Xterm Hangs - Possible scheduler defect?

2005-02-25 Thread Lee Revell
On Fri, 2005-02-25 at 15:53 +, Paulo Marques wrote: Ingo Oeser wrote: Chris Friesen wrote: Ingo Oeser wrote: [...] You would need to change the priority of task 1 until it releases the mutex. Ideally the owner gets the maximum priority of his and all the waiters on it, until it

Re: Xterm Hangs - Possible scheduler defect?

2005-02-25 Thread Chris Friesen
Lee Revell wrote: The solution to your problem (which is as old as the hills) involves priority inheriting mutexes which are available in the RT preempt patch (if you build with CONFIG_PREEMPT_RT). This should be usable for hard realtime applications. Yup. I was just pointing out that userspace

Re: Xterm Hangs - Possible scheduler defect?

2005-02-25 Thread Helge Hafting
On Thu, Feb 24, 2005 at 02:22:37PM -0500, Chad N. Tindel wrote: If you keep a learning attitude, there is a chance for this discussion to go on. However, if you keep the Come now, don't bullshit me, this is a broken architecture and you're just trying to cover up attitude, you're just

Re: Xterm Hangs - Possible scheduler defect?

2005-02-25 Thread Chad N. Tindel
What's so special about a 64-way box? They're expensive and customers don't expect a single userspace thread to tie up the other 63 CPUs no matter how buggy it is. It is intuitively obvious that a buggy kernel can bring a system to its knees, but it is not intuitively obvious that a buggy

Re: Xterm Hangs - Possible scheduler defect?

2005-02-25 Thread Lee Revell
On Fri, 2005-02-25 at 16:02 -0500, Chad N. Tindel wrote: They're expensive and customers don't expect a single userspace thread to tie up the other 63 CPUs no matter how buggy it is. It is intuitively obvious that a buggy kernel can bring a system to its knees, but it is not intuitively

Re: Xterm Hangs - Possible scheduler defect?

2005-02-24 Thread Mike Galbraith
At 12:53 PM 2/24/2005 -0500, Chad N. Tindel wrote: > > Hmmm... Are you suggesting it is OK for a kernel to get nearly completely > > hosed and for not fully utilize all the processors in the system because > > of one SCHED_FIFO thread? > > Sure. You specifically directed the scheduler to run your

Re: Xterm Hangs - Possible scheduler defect?

2005-02-24 Thread Ingo Oeser
Chad N. Tindel wrote: > I think what we have are the need for two levels of applications: > > 1. That which wishes to be the highest priority userspace application, and > wishes to preempt all other userspace applications. Such an application is > OK being preempted by the kernel when the kernel

Re: Xterm Hangs - Possible scheduler defect?

2005-02-24 Thread Kyle Moffett
On Feb 24, 2005, at 18:00, Andrew Morton wrote: Here's a quicky which will convert all your kernel threads to SCHED_RR, priority 99. Please test. We have a bunch of workstations here where we run a similar thing during boot, as well as starting a SCHED_RR @ 99 sulogin-type process on tty12. It

Re: Xterm Hangs - Possible scheduler defect?

2005-02-24 Thread Andrew Morton
Chris Friesen <[EMAIL PROTECTED]> wrote: > > Andrew Morton wrote: > > >chrt -r 99 -9 $i Make that chrt -r 99 -p $i - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at

Re: Xterm Hangs - Possible scheduler defect?

2005-02-24 Thread Chris Friesen
Andrew Morton wrote: #!/bin/sh PIDS=$(ps axo pid,command | grep ' \[.*\]$' | sed -e 's/ \[.*\]$//') for i in $PIDS do chrt -r 99 -9 $i done For the unaware, "chrt" is part of the schedutils package. (I didn't know about it till just now...figured I'd save others the trouble of

Re: Xterm Hangs - Possible scheduler defect?

2005-02-24 Thread Andrew Morton
"Chad N. Tindel" <[EMAIL PROTECTED]> wrote: > > I would make the following assertion for any kernel: > > No single userspace thread of execution running on an SMP system should be > able to hose a box by going CPU-bound, bug in the software or no bug. But if we were to enforce that policy,

Re: Xterm Hangs - Possible scheduler defect?

2005-02-24 Thread Chad N. Tindel
> In many Unices, crucial kernel threads run at realtime priority with a > static priority higher than is accessible to user code. Yep. > That being said, however, you've got to be a privileged user to set > real time very high priority on a thread, and if you do, you'd better > know what

Re: Xterm Hangs - Possible scheduler defect?

2005-02-24 Thread Peter Chubb
> "Chad" == Chad N Tindel <[EMAIL PROTECTED]> writes: Chad> I would make the following assertion for any kernel: Chad> No single userspace thread of execution running on an SMP system Chad> should be able to hose a box by going CPU-bound, bug in the Chad> software or no bug. Any kernel

Re: Xterm Hangs - Possible scheduler defect?

2005-02-24 Thread Chris Friesen
Chad N. Tindel wrote: OK. Would you say it would be a reasonable default to have SCHED_FIFO userspace threads running at a lower priority than essential kernel threads (say, the load balancer and the events thread), and give root the ability to explicitly have userspace threads preempt the

Re: Xterm Hangs - Possible scheduler defect?

2005-02-24 Thread Chad N. Tindel
> >I'm all for allowing people to shoot themselves in the foot. That doesn't > >mean that it is OK for a single userspace thread to mess up a 64-way box. > > If root has explicitly stated that the thread in question is the highest > priority thing on the entire machine, why would it not be

Re: Xterm Hangs - Possible scheduler defect?

2005-02-24 Thread Barry K. Nathan
> > This is much, much better than the "users are stupid, we must protect > > them from themselves" kind of way that other OS'es use. > > Isn't this what the kernel attempts to do in many other places? What else > could possibly be the point of sending SIGSEGV and causing applications > to dump

Re: Xterm Hangs - Possible scheduler defect?

2005-02-24 Thread Chris Friesen
Chad N. Tindel wrote: I personally like the linux way: "root has the ability to shoot himself in the foot if he wants to". This is my computer, damn it, I am the one who tells it what to do. I'm all for allowing people to shoot themselves in the foot. That doesn't mean that it is OK for a

Re: Xterm Hangs - Possible scheduler defect?

2005-02-24 Thread Chad N. Tindel
> If you keep a learning attitude, there is a chance for this discussion > to go on. However, if you keep the "Come now, don't bullshit me, this is > a broken architecture and you're just trying to cover up" attitude, > you're just going to get discarded as a troll. I'm not trying to troll

Re: Xterm Hangs - Possible scheduler defect?

2005-02-24 Thread Paulo Marques
Chad N. Tindel wrote: Low-latency userspace apps. The audio guys, for instance, are trying to get latencies down to the 100us range. If random kernel threads can preempt userspace at any time, they wreak havoc with latency as seen by userspace. Come now. There is no such thing as a random

Re: Xterm Hangs - Possible scheduler defect?

2005-02-24 Thread Chad N. Tindel
> Low-latency userspace apps. The audio guys, for instance, are trying to > get latencies down to the 100us range. > > If random kernel threads can preempt userspace at any time, they wreak > havoc with latency as seen by userspace. Come now. There is no such thing as a random kernel thread.

Re: Xterm Hangs - Possible scheduler defect?

2005-02-24 Thread Chris Friesen
Chad N. Tindel wrote: 1. Kernel preempts all. There may be some hierarchy of kernel priorities too, but it isn't important here. 2. SCHED_FIFO processes preempt all userspace applications. 3. SCHED_RR. 4. SCHED_OTHER. Under no circumstances should any single CPU-bound userspace thread

Re: Xterm Hangs - Possible scheduler defect?

2005-02-24 Thread Chad N. Tindel
> > Hmmm... Are you suggesting it is OK for a kernel to get nearly completely > > hosed and for not fully utilize all the processors in the system because > > of one SCHED_FIFO thread? > > Sure. You specifically directed the scheduler to run your thread at a > higher priority than anything else.

Re: Xterm Hangs - Possible scheduler defect?

2005-02-24 Thread Chad N. Tindel
> Why would anyone write a thread than uses exactly 100% of one cpu? > It seems wrong to me. It is unsafe if they really need that much > processing power, what if an interrupt happens? Then they get slightly less. > If they got a 10% faster cpu, would this thread suddenly drop to 90% > usage

Re: Xterm Hangs - Possible scheduler defect?

2005-02-24 Thread Helge Hafting
Chad N. Tindel wrote: But the other side of the coin is that a SCHED_FIFO userspace task presumably has extreme latency requirements, so it doesn't *want* to be preempted by some routine kernel operation. People would get irritated if we were to do that. Just to follow up a bit. People

Re: Xterm Hangs - Possible scheduler defect?

2005-02-24 Thread Helge Hafting
Chad N. Tindel wrote: But the other side of the coin is that a SCHED_FIFO userspace task presumably has extreme latency requirements, so it doesn't *want* to be preempted by some routine kernel operation. People would get irritated if we were to do that. Just to follow up a bit. People

Re: Xterm Hangs - Possible scheduler defect?

2005-02-24 Thread Chad N. Tindel
Why would anyone write a thread than uses exactly 100% of one cpu? It seems wrong to me. It is unsafe if they really need that much processing power, what if an interrupt happens? Then they get slightly less. If they got a 10% faster cpu, would this thread suddenly drop to 90% usage (and no

Re: Xterm Hangs - Possible scheduler defect?

2005-02-24 Thread Chad N. Tindel
Hmmm... Are you suggesting it is OK for a kernel to get nearly completely hosed and for not fully utilize all the processors in the system because of one SCHED_FIFO thread? Sure. You specifically directed the scheduler to run your thread at a higher priority than anything else. The way

Re: Xterm Hangs - Possible scheduler defect?

2005-02-24 Thread Chris Friesen
Chad N. Tindel wrote: 1. Kernel preempts all. There may be some hierarchy of kernel priorities too, but it isn't important here. 2. SCHED_FIFO processes preempt all userspace applications. 3. SCHED_RR. 4. SCHED_OTHER. Under no circumstances should any single CPU-bound userspace thread

Re: Xterm Hangs - Possible scheduler defect?

2005-02-24 Thread Chad N. Tindel
Low-latency userspace apps. The audio guys, for instance, are trying to get latencies down to the 100us range. If random kernel threads can preempt userspace at any time, they wreak havoc with latency as seen by userspace. Come now. There is no such thing as a random kernel thread. Any

Re: Xterm Hangs - Possible scheduler defect?

2005-02-24 Thread Paulo Marques
Chad N. Tindel wrote: Low-latency userspace apps. The audio guys, for instance, are trying to get latencies down to the 100us range. If random kernel threads can preempt userspace at any time, they wreak havoc with latency as seen by userspace. Come now. There is no such thing as a random

Re: Xterm Hangs - Possible scheduler defect?

2005-02-24 Thread Chad N. Tindel
If you keep a learning attitude, there is a chance for this discussion to go on. However, if you keep the Come now, don't bullshit me, this is a broken architecture and you're just trying to cover up attitude, you're just going to get discarded as a troll. I'm not trying to troll here; I

Re: Xterm Hangs - Possible scheduler defect?

2005-02-24 Thread Chris Friesen
Chad N. Tindel wrote: I personally like the linux way: root has the ability to shoot himself in the foot if he wants to. This is my computer, damn it, I am the one who tells it what to do. I'm all for allowing people to shoot themselves in the foot. That doesn't mean that it is OK for a single

Re: Xterm Hangs - Possible scheduler defect?

2005-02-24 Thread Barry K. Nathan
This is much, much better than the users are stupid, we must protect them from themselves kind of way that other OS'es use. Isn't this what the kernel attempts to do in many other places? What else could possibly be the point of sending SIGSEGV and causing applications to dump core when

Re: Xterm Hangs - Possible scheduler defect?

2005-02-24 Thread Chad N. Tindel
I'm all for allowing people to shoot themselves in the foot. That doesn't mean that it is OK for a single userspace thread to mess up a 64-way box. If root has explicitly stated that the thread in question is the highest priority thing on the entire machine, why would it not be okay. The

Re: Xterm Hangs - Possible scheduler defect?

2005-02-24 Thread Chris Friesen
Chad N. Tindel wrote: OK. Would you say it would be a reasonable default to have SCHED_FIFO userspace threads running at a lower priority than essential kernel threads (say, the load balancer and the events thread), and give root the ability to explicitly have userspace threads preempt the

Re: Xterm Hangs - Possible scheduler defect?

2005-02-24 Thread Peter Chubb
Chad == Chad N Tindel [EMAIL PROTECTED] writes: Chad I would make the following assertion for any kernel: Chad No single userspace thread of execution running on an SMP system Chad should be able to hose a box by going CPU-bound, bug in the Chad software or no bug. Any kernel should be able to

Re: Xterm Hangs - Possible scheduler defect?

2005-02-24 Thread Chad N. Tindel
In many Unices, crucial kernel threads run at realtime priority with a static priority higher than is accessible to user code. Yep. That being said, however, you've got to be a privileged user to set real time very high priority on a thread, and if you do, you'd better know what you're

Re: Xterm Hangs - Possible scheduler defect?

2005-02-24 Thread Andrew Morton
Chad N. Tindel [EMAIL PROTECTED] wrote: I would make the following assertion for any kernel: No single userspace thread of execution running on an SMP system should be able to hose a box by going CPU-bound, bug in the software or no bug. But if we were to enforce that policy, realtime

Re: Xterm Hangs - Possible scheduler defect?

2005-02-24 Thread Chris Friesen
Andrew Morton wrote: #!/bin/sh PIDS=$(ps axo pid,command | grep ' \[.*\]$' | sed -e 's/ \[.*\]$//') for i in $PIDS do chrt -r 99 -9 $i done For the unaware, chrt is part of the schedutils package. (I didn't know about it till just now...figured I'd save others the trouble of searching.)

Re: Xterm Hangs - Possible scheduler defect?

2005-02-24 Thread Andrew Morton
Chris Friesen [EMAIL PROTECTED] wrote: Andrew Morton wrote: chrt -r 99 -9 $i Make that chrt -r 99 -p $i - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at

Re: Xterm Hangs - Possible scheduler defect?

2005-02-24 Thread Kyle Moffett
On Feb 24, 2005, at 18:00, Andrew Morton wrote: Here's a quicky which will convert all your kernel threads to SCHED_RR, priority 99. Please test. We have a bunch of workstations here where we run a similar thing during boot, as well as starting a SCHED_RR @ 99 sulogin-type process on tty12. It

Re: Xterm Hangs - Possible scheduler defect?

2005-02-24 Thread Ingo Oeser
Chad N. Tindel wrote: I think what we have are the need for two levels of applications: 1. That which wishes to be the highest priority userspace application, and wishes to preempt all other userspace applications. Such an application is OK being preempted by the kernel when the kernel

Re: Xterm Hangs - Possible scheduler defect?

2005-02-24 Thread Mike Galbraith
At 12:53 PM 2/24/2005 -0500, Chad N. Tindel wrote: Hmmm... Are you suggesting it is OK for a kernel to get nearly completely hosed and for not fully utilize all the processors in the system because of one SCHED_FIFO thread? Sure. You specifically directed the scheduler to run your thread

Re: Xterm Hangs - Possible scheduler defect?

2005-02-23 Thread Andrew Morton
"Chad N. Tindel" <[EMAIL PROTECTED]> wrote: > > > `xterm' is waiting for the other CPU to schedule a kernel thread (which is > > bound to that CPU). Once that kernel thread has done a little bit of work, > > `xterm' can terminate. > > > > But kernel threads don't run with realtime policy, so

Re: Xterm Hangs - Possible scheduler defect?

2005-02-23 Thread Chad N. Tindel
> But the other side of the coin is that a SCHED_FIFO userspace task > presumably has extreme latency requirements, so it doesn't *want* to be > preempted by some routine kernel operation. People would get irritated if > we were to do that. Just to follow up a bit. People writing apps that run

Re: Xterm Hangs - Possible scheduler defect?

2005-02-23 Thread Chad N. Tindel
> `xterm' is waiting for the other CPU to schedule a kernel thread (which is > bound to that CPU). Once that kernel thread has done a little bit of work, > `xterm' can terminate. > > But kernel threads don't run with realtime policy, so your userspace app > has permanently starved that kernel

Re: Xterm Hangs - Possible scheduler defect?

2005-02-23 Thread Andrew Morton
"Chad N. Tindel" <[EMAIL PROTECTED]> wrote: > > We have hit a defect where an exiting xterm process will hang. This is > running > on a 2-cpu IA-64 box. We have a multithreaded application, where one thread > is SCHED_FIFO and is running with priority 98, and the other thread is just > a

Xterm Hangs - Possible scheduler defect?

2005-02-23 Thread Chad N. Tindel
Hello- We have hit a defect where an exiting xterm process will hang. This is running on a 2-cpu IA-64 box. We have a multithreaded application, where one thread is SCHED_FIFO and is running with priority 98, and the other thread is just a normal SCHED_OTHER thread. The SCHED_FIFO thread is in

Xterm Hangs - Possible scheduler defect?

2005-02-23 Thread Chad N. Tindel
Hello- We have hit a defect where an exiting xterm process will hang. This is running on a 2-cpu IA-64 box. We have a multithreaded application, where one thread is SCHED_FIFO and is running with priority 98, and the other thread is just a normal SCHED_OTHER thread. The SCHED_FIFO thread is in

Re: Xterm Hangs - Possible scheduler defect?

2005-02-23 Thread Andrew Morton
Chad N. Tindel [EMAIL PROTECTED] wrote: We have hit a defect where an exiting xterm process will hang. This is running on a 2-cpu IA-64 box. We have a multithreaded application, where one thread is SCHED_FIFO and is running with priority 98, and the other thread is just a normal

Re: Xterm Hangs - Possible scheduler defect?

2005-02-23 Thread Chad N. Tindel
`xterm' is waiting for the other CPU to schedule a kernel thread (which is bound to that CPU). Once that kernel thread has done a little bit of work, `xterm' can terminate. But kernel threads don't run with realtime policy, so your userspace app has permanently starved that kernel thread.

Re: Xterm Hangs - Possible scheduler defect?

2005-02-23 Thread Chad N. Tindel
But the other side of the coin is that a SCHED_FIFO userspace task presumably has extreme latency requirements, so it doesn't *want* to be preempted by some routine kernel operation. People would get irritated if we were to do that. Just to follow up a bit. People writing apps that run at

Re: Xterm Hangs - Possible scheduler defect?

2005-02-23 Thread Andrew Morton
Chad N. Tindel [EMAIL PROTECTED] wrote: `xterm' is waiting for the other CPU to schedule a kernel thread (which is bound to that CPU). Once that kernel thread has done a little bit of work, `xterm' can terminate. But kernel threads don't run with realtime policy, so your userspace app