Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-02-20 Thread Christoph Lameter
On Tue, 20 Feb 2007, Max Krasnyansky wrote: > Ok. Sounds like disabling cache_reaper is a better option for now. Like you > said > it's unlikely that slabs will grow much if that cpu is not heavily used by the > kernel. Running for prolonged times without cache_reaper is no good. What we are

Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-02-20 Thread Max Krasnyansky
Christoph Lameter wrote: On Tue, 20 Feb 2007, Max Krasnyansky wrote: I guess I kind of hijacked the thread. The second part of my first email was dropped. Basically I was saying that I'm working on CPU isolation extensions. Where an isolated CPU is not supposed to do much kernel work. In which

Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-02-20 Thread Christoph Lameter
On Tue, 20 Feb 2007, Max Krasnyansky wrote: > I guess I kind of hijacked the thread. The second part of my first email was > dropped. Basically I was saying that I'm working on CPU isolation extensions. > Where an isolated CPU is not supposed to do much kernel work. In which case > you'd want to

Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-02-20 Thread Christoph Lameter
On Tue, 20 Feb 2007, Max Krasnyansky wrote: > I agree that running the reaper on the wrong CPU is not the best way to go > about it. > But it seems like disabling it is even worse, unless I missing something. ie > wasting > memory. Disabling during shutdown is no problem because the per cpu

Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-02-20 Thread Max Krasnyansky
Oleg Nesterov wrote: On 02/20, Christoph Lameter wrote: On Tue, 20 Feb 2007, Max Krasnyansky wrote: Well seems that we have a set of unresolved issues with workqueues and cpu hotplug. How about storing 'cpu' explicitly in the work queue instead of relying on the smp_processor_id() and

Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-02-20 Thread Max Krasnyansky
Christoph Lameter wrote: On Tue, 20 Feb 2007, Max Krasnyansky wrote: Well seems that we have a set of unresolved issues with workqueues and cpu hotplug. How about storing 'cpu' explicitly in the work queue instead of relying on the smp_processor_id() and friends ? That way there is no

Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-02-20 Thread Oleg Nesterov
On 02/20, Christoph Lameter wrote: > > On Tue, 20 Feb 2007, Max Krasnyansky wrote: > > > > > Well seems that we have a set of unresolved issues with workqueues and > > > > cpu > > > > hotplug. > > > > How about storing 'cpu' explicitly in the work queue instead of relying on > > the > >

Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-02-20 Thread Max Krasnyansky
Folks, Oleg Nesterov wrote: Even if smp_processor_id() was stable during the execution of cache_reap(), this work_struct can be moved to another CPU if CPU_DEAD happens. We can't avoid this, and this is correct. Uhh This may not be correct in terms of how the slab operates. But this is

Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-02-20 Thread Christoph Lameter
On Tue, 20 Feb 2007, Max Krasnyansky wrote: > > > Well seems that we have a set of unresolved issues with workqueues and cpu > > > hotplug. > > How about storing 'cpu' explicitly in the work queue instead of relying on the > smp_processor_id() and friends ? That way there is no ambiguity when >

Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-02-20 Thread Christoph Lameter
On Tue, 20 Feb 2007, Max Krasnyansky wrote: Well seems that we have a set of unresolved issues with workqueues and cpu hotplug. How about storing 'cpu' explicitly in the work queue instead of relying on the smp_processor_id() and friends ? That way there is no ambiguity when

Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-02-20 Thread Max Krasnyansky
Folks, Oleg Nesterov wrote: Even if smp_processor_id() was stable during the execution of cache_reap(), this work_struct can be moved to another CPU if CPU_DEAD happens. We can't avoid this, and this is correct. Uhh This may not be correct in terms of how the slab operates. But this is

Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-02-20 Thread Oleg Nesterov
On 02/20, Christoph Lameter wrote: On Tue, 20 Feb 2007, Max Krasnyansky wrote: Well seems that we have a set of unresolved issues with workqueues and cpu hotplug. How about storing 'cpu' explicitly in the work queue instead of relying on the smp_processor_id() and

Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-02-20 Thread Max Krasnyansky
Christoph Lameter wrote: On Tue, 20 Feb 2007, Max Krasnyansky wrote: Well seems that we have a set of unresolved issues with workqueues and cpu hotplug. How about storing 'cpu' explicitly in the work queue instead of relying on the smp_processor_id() and friends ? That way there is no

Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-02-20 Thread Max Krasnyansky
Oleg Nesterov wrote: On 02/20, Christoph Lameter wrote: On Tue, 20 Feb 2007, Max Krasnyansky wrote: Well seems that we have a set of unresolved issues with workqueues and cpu hotplug. How about storing 'cpu' explicitly in the work queue instead of relying on the smp_processor_id() and

Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-02-20 Thread Christoph Lameter
On Tue, 20 Feb 2007, Max Krasnyansky wrote: I agree that running the reaper on the wrong CPU is not the best way to go about it. But it seems like disabling it is even worse, unless I missing something. ie wasting memory. Disabling during shutdown is no problem because the per cpu caches

Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-02-20 Thread Christoph Lameter
On Tue, 20 Feb 2007, Max Krasnyansky wrote: I guess I kind of hijacked the thread. The second part of my first email was dropped. Basically I was saying that I'm working on CPU isolation extensions. Where an isolated CPU is not supposed to do much kernel work. In which case you'd want to run

Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-02-20 Thread Max Krasnyansky
Christoph Lameter wrote: On Tue, 20 Feb 2007, Max Krasnyansky wrote: I guess I kind of hijacked the thread. The second part of my first email was dropped. Basically I was saying that I'm working on CPU isolation extensions. Where an isolated CPU is not supposed to do much kernel work. In which

Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-02-20 Thread Christoph Lameter
On Tue, 20 Feb 2007, Max Krasnyansky wrote: Ok. Sounds like disabling cache_reaper is a better option for now. Like you said it's unlikely that slabs will grow much if that cpu is not heavily used by the kernel. Running for prolonged times without cache_reaper is no good. What we are

Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-01-29 Thread Oleg Nesterov
On 01/29, Christoph Lameter wrote: > > On Tue, 30 Jan 2007, Oleg Nesterov wrote: > > > Now we have 2 additional events, CPU_LOCK_ACQUIRE/CPU_LOCK_RELEASE, > > so cpuup_callback() can use them to lock/unlock cache_chain_mutex, > > but this is not related. > > Then we wont need to do the

Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-01-29 Thread Christoph Lameter
On Tue, 30 Jan 2007, Oleg Nesterov wrote: > Now we have 2 additional events, CPU_LOCK_ACQUIRE/CPU_LOCK_RELEASE, > so cpuup_callback() can use them to lock/unlock cache_chain_mutex, > but this is not related. Then we wont need to do the mutex_lock/unlock in CPU_DOWN_XX anymore, right? Which

Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-01-29 Thread Oleg Nesterov
On 01/29, Christoph Lameter wrote: > > Here is the patch against 2.6.20-rc6-mm2. CPU_DOWN_PREPARE and > CPU_DOWN_FAILED somehow vanished in mm? No, no, there are still in place, so I believe your patch is good. Now we have 2 additional events, CPU_LOCK_ACQUIRE/CPU_LOCK_RELEASE, so

Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-01-29 Thread Christoph Lameter
On Mon, 29 Jan 2007, Oleg Nesterov wrote: > > But we could delay CPU_DOWN in the handler for the slab until we know that > > the cache_reaper is no longer running? > > Hmm... I don't undestand this. We can delay CPU_DOWN if we cancel cache_reaper > like you did in the previous patch. Did you

Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-01-29 Thread Oleg Nesterov
On 01/29, Christoph Lameter wrote: > > On Mon, 29 Jan 2007, Oleg Nesterov wrote: > > > > > Even if smp_processor_id() was stable during the execution of > > > > cache_reap(), > > > > this work_struct can be moved to another CPU if CPU_DEAD happens. We > > > > can't > > > > avoid this, and this

Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-01-29 Thread Oleg Nesterov
On 01/29, Christoph Lameter wrote: > > On Mon, 29 Jan 2007, Oleg Nesterov wrote: > > > > The slab would need a notification > > > that the workqueue for a processor was shutdown in order to set work.func > > > = NULL. > > > > The slab has a notification:

Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-01-29 Thread Christoph Lameter
On Mon, 29 Jan 2007, Oleg Nesterov wrote: > > > Even if smp_processor_id() was stable during the execution of > > > cache_reap(), > > > this work_struct can be moved to another CPU if CPU_DEAD happens. We can't > > > avoid this, and this is correct. > > > > Uhh This may not be correct in

Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-01-29 Thread Christoph Lameter
On Mon, 29 Jan 2007, Oleg Nesterov wrote: > > The slab would need a notification > > that the workqueue for a processor was shutdown in order to set work.func > > = NULL. > > The slab has a notification: CPU_XXX events. It should cancel a pending per > cpu

Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-01-29 Thread Oleg Nesterov
On 01/29, Christoph Lameter wrote: > > On Mon, 29 Jan 2007, Oleg Nesterov wrote: > > > > > This is wrong. Suppose we have a CPU_UP,CPU_DOWN,CPU_UP sequence. The > > > > last > > > > CPU_UP will not restart a per-cpu "cache_reap timer". > > > > > > Why? > > > > Because the last CPU_UP calls

Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-01-29 Thread Christoph Lameter
On Mon, 29 Jan 2007, Oleg Nesterov wrote: > > > This is wrong. Suppose we have a CPU_UP,CPU_DOWN,CPU_UP sequence. The last > > > CPU_UP will not restart a per-cpu "cache_reap timer". > > > > Why? > > Because the last CPU_UP calls start_cpu_timer(), but since ->work.func != NULL > we don't do

Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-01-29 Thread Oleg Nesterov
On 01/29, Christoph Lameter wrote: > > On Mon, 29 Jan 2007, Oleg Nesterov wrote: > > > Now, > > static void __devinit start_cpu_timer(int cpu) > > { > > struct delayed_work *reap_work = _cpu(reap_work, cpu); > > > > if (keventd_up() && reap_work->work.func ==

Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-01-29 Thread Christoph Lameter
On Mon, 29 Jan 2007, Oleg Nesterov wrote: > Now, > static void __devinit start_cpu_timer(int cpu) > { > struct delayed_work *reap_work = _cpu(reap_work, cpu); > > if (keventd_up() && reap_work->work.func == NULL) { >

Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-01-29 Thread Christoph Lameter
On Mon, 29 Jan 2007, Oleg Nesterov wrote: Now, static void __devinit start_cpu_timer(int cpu) { struct delayed_work *reap_work = per_cpu(reap_work, cpu); if (keventd_up() reap_work-work.func == NULL) { init_reap_node(cpu);

Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-01-29 Thread Oleg Nesterov
On 01/29, Christoph Lameter wrote: On Mon, 29 Jan 2007, Oleg Nesterov wrote: Now, static void __devinit start_cpu_timer(int cpu) { struct delayed_work *reap_work = per_cpu(reap_work, cpu); if (keventd_up() reap_work-work.func == NULL) {

Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-01-29 Thread Christoph Lameter
On Mon, 29 Jan 2007, Oleg Nesterov wrote: This is wrong. Suppose we have a CPU_UP,CPU_DOWN,CPU_UP sequence. The last CPU_UP will not restart a per-cpu cache_reap timer. Why? Because the last CPU_UP calls start_cpu_timer(), but since -work.func != NULL we don't do

Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-01-29 Thread Oleg Nesterov
On 01/29, Christoph Lameter wrote: On Mon, 29 Jan 2007, Oleg Nesterov wrote: This is wrong. Suppose we have a CPU_UP,CPU_DOWN,CPU_UP sequence. The last CPU_UP will not restart a per-cpu cache_reap timer. Why? Because the last CPU_UP calls start_cpu_timer(), but since

Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-01-29 Thread Christoph Lameter
On Mon, 29 Jan 2007, Oleg Nesterov wrote: The slab would need a notification that the workqueue for a processor was shutdown in order to set work.func = NULL. The slab has a notification: CPU_XXX events. It should cancel a pending per cpu reap_work

Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-01-29 Thread Christoph Lameter
On Mon, 29 Jan 2007, Oleg Nesterov wrote: Even if smp_processor_id() was stable during the execution of cache_reap(), this work_struct can be moved to another CPU if CPU_DEAD happens. We can't avoid this, and this is correct. Uhh This may not be correct in terms of how the

Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-01-29 Thread Oleg Nesterov
On 01/29, Christoph Lameter wrote: On Mon, 29 Jan 2007, Oleg Nesterov wrote: The slab would need a notification that the workqueue for a processor was shutdown in order to set work.func = NULL. The slab has a notification: CPU_XXX events. It

Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-01-29 Thread Oleg Nesterov
On 01/29, Christoph Lameter wrote: On Mon, 29 Jan 2007, Oleg Nesterov wrote: Even if smp_processor_id() was stable during the execution of cache_reap(), this work_struct can be moved to another CPU if CPU_DEAD happens. We can't avoid this, and this is correct.

Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-01-29 Thread Christoph Lameter
On Mon, 29 Jan 2007, Oleg Nesterov wrote: But we could delay CPU_DOWN in the handler for the slab until we know that the cache_reaper is no longer running? Hmm... I don't undestand this. We can delay CPU_DOWN if we cancel cache_reaper like you did in the previous patch. Did you mean

Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-01-29 Thread Oleg Nesterov
On 01/29, Christoph Lameter wrote: Here is the patch against 2.6.20-rc6-mm2. CPU_DOWN_PREPARE and CPU_DOWN_FAILED somehow vanished in mm? No, no, there are still in place, so I believe your patch is good. Now we have 2 additional events, CPU_LOCK_ACQUIRE/CPU_LOCK_RELEASE, so cpuup_callback()

Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-01-29 Thread Christoph Lameter
On Tue, 30 Jan 2007, Oleg Nesterov wrote: Now we have 2 additional events, CPU_LOCK_ACQUIRE/CPU_LOCK_RELEASE, so cpuup_callback() can use them to lock/unlock cache_chain_mutex, but this is not related. Then we wont need to do the mutex_lock/unlock in CPU_DOWN_XX anymore, right? Which brings

Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-01-29 Thread Oleg Nesterov
On 01/29, Christoph Lameter wrote: On Tue, 30 Jan 2007, Oleg Nesterov wrote: Now we have 2 additional events, CPU_LOCK_ACQUIRE/CPU_LOCK_RELEASE, so cpuup_callback() can use them to lock/unlock cache_chain_mutex, but this is not related. Then we wont need to do the mutex_lock/unlock in

slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-01-28 Thread Oleg Nesterov
For the beginning, about another (but related) minor problem, debug_smp_processor_id: /* * Kernel threads bound to a single CPU can safely use * smp_processor_id(): */ This is only true without CONFIG_HOTPLUG_CPU.

slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

2007-01-28 Thread Oleg Nesterov
For the beginning, about another (but related) minor problem, debug_smp_processor_id: /* * Kernel threads bound to a single CPU can safely use * smp_processor_id(): */ This is only true without CONFIG_HOTPLUG_CPU.