Re: Call for performance evaluation: net.isr.direct (fwd)

2005-10-15 Thread Bruce Evans
On Sat, 15 Oct 2005, Robert Watson wrote: On Sat, 15 Oct 2005, Bruce Evans wrote: ... However, for netisrs I think it is common to process only 1 packet per context switch, at least in the loopback case. The Mach scheduler allows deferred wakeups to be issued -- wake up a thread in the

Re: Call for performance evaluation: net.isr.direct (fwd)

2005-10-15 Thread Garrett Wollman
On Sun, 16 Oct 2005 14:06:32 +1000 (EST), Bruce Evans [EMAIL PROTECTED] said: Probably the problem is largest for latency, especially in benchmarks. Latency benchmarks probably have to start cold, so they have no chance of queue lengths 1, so there must be a context switch per packet and

Re: Call for performance evaluation: net.isr.direct (fwd)

2005-10-14 Thread Poul-Henning Kamp
In message [EMAIL PROTECTED], Andrew Gallatin writes: Linux already takes care of syncing the TSC between SMP cpus, so we know it is possible. This seems like a much more doable optimization. And it is likely to have other benefits.. Validating that the TSC is reliable is a nontrivial task

Re: Call for performance evaluation: net.isr.direct (fwd)

2005-10-14 Thread Bruce Evans
On Fri, 14 Oct 2005, Poul-Henning Kamp wrote: In message [EMAIL PROTECTED], Andrew Gallatin writes: Linux already takes care of syncing the TSC between SMP cpus, so we know it is possible. This seems like a much more doable optimization. And it is likely to have other benefits.. The

Re: Call for performance evaluation: net.isr.direct (fwd)

2005-10-14 Thread Poul-Henning Kamp
In message [EMAIL PROTECTED], Bruce Evans writes: On Fri, 14 Oct 2005, Poul-Henning Kamp wrote: In message [EMAIL PROTECTED], Andrew Gallatin writes: Linux already takes care of syncing the TSC between SMP cpus, so we know it is possible. This seems like a much more doable optimization.

Re: Call for performance evaluation: net.isr.direct (fwd)

2005-10-14 Thread Andrew Gallatin
Poul-Henning Kamp writes: The best compromise solution therefore is to change the scheduler to make decisions based on the TSC ticks (or equivalent on other archs) and at regular intervals figure out how fast the CPU ran in the last period and convert the TSC ticks accumulated to a time

Re: Call for performance evaluation: net.isr.direct (fwd)

2005-10-14 Thread Poul-Henning Kamp
In message [EMAIL PROTECTED], Andrew Gallatin writes: Poul-Henning Kamp writes: The best compromise solution therefore is to change the scheduler to make decisions based on the TSC ticks (or equivalent on other archs) and at regular intervals figure out how fast the CPU ran in the last

Re: Call for performance evaluation: net.isr.direct (fwd)

2005-10-14 Thread Andrew Gallatin
Poul-Henning Kamp writes: In message [EMAIL PROTECTED], Andrew Gallatin writes: Poul-Henning Kamp writes: The best compromise solution therefore is to change the scheduler to make decisions based on the TSC ticks (or equivalent on other archs) and at regular intervals figure

Re: Call for performance evaluation: net.isr.direct (fwd)

2005-10-14 Thread Matthew Reimer
Poul-Henning Kamp wrote: In message [EMAIL PROTECTED], Andrew Gallatin writes: What if somebody were to port the linux TSC syncing code, and use it to decide whether or not set kern.timecounter.smp_tsc=1? Would you object to that? Yes, I would object to that. Even to this

Re: Call for performance evaluation: net.isr.direct (fwd)

2005-10-14 Thread Andrew Gallatin
Poul-Henning Kamp writes: The solution is not faster but less reliable timekeeping, the solution is to move the scheduler(s) away from using time as an approximation of cpu cycles. So you mean rather than use binuptime() in mi_switch(), use some per-cpu cycle counter (like rdtsc)? Heck,

Re: Call for performance evaluation: net.isr.direct (fwd)

2005-10-14 Thread Poul-Henning Kamp
In message [EMAIL PROTECTED], Andrew Gallatin writes: Poul-Henning Kamp writes: The solution is not faster but less reliable timekeeping, the solution is to move the scheduler(s) away from using time as an approximation of cpu cycles. So you mean rather than use binuptime() in

Re: Call for performance evaluation: net.isr.direct (fwd)

2005-10-14 Thread Bruce Evans
On Fri, 14 Oct 2005, Poul-Henning Kamp wrote: In message [EMAIL PROTECTED], Bruce Evans writes: The timestamps in mi_switch() are taken on the same CPU and only their differences are used, so they don't even need to be synced. It they use the TSC, then the TSCs just need to have the same

Re: Call for performance evaluation: net.isr.direct (fwd)

2005-10-14 Thread Bruce Evans
On Fri, 14 Oct 2005, Poul-Henning Kamp wrote: In message [EMAIL PROTECTED], Andrew Gallatin writes: What if somebody were to port the linux TSC syncing code, and use it to decide whether or not set kern.timecounter.smp_tsc=1? Would you object to that? Yes, I would object to that. Even to

Re: Call for performance evaluation: net.isr.direct (fwd)

2005-10-14 Thread Bruce Evans
On Fri, 14 Oct 2005, Andrew Gallatin wrote: Bear in mind that I have no clue about timekeeping. I got into this just because I noticed using a TSC timecounter reduces context switch latency by 40% or more on all the SMP platforms I have access to: 1.0GHz dual PIII : 50% reduction vs i8254

Re: Call for performance evaluation: net.isr.direct (fwd)

2005-10-14 Thread Bruce Evans
On Fri, 14 Oct 2005, Poul-Henning Kamp wrote: In message [EMAIL PROTECTED], Andrew Gallatin writes: Poul-Henning Kamp writes: The solution is not faster but less reliable timekeeping, the solution is to move the scheduler(s) away from using time as an approximation of cpu cycles. So you

Re: Call for performance evaluation: net.isr.direct (fwd)

2005-10-14 Thread Andrew Gallatin
Bruce Evans writes: On Fri, 14 Oct 2005, Andrew Gallatin wrote: Bear in mind that I have no clue about timekeeping. I got into this just because I noticed using a TSC timecounter reduces context switch latency by 40% or more on all the SMP platforms I have access to: 1.0GHz

Re: Call for performance evaluation: net.isr.direct (fwd)

2005-10-14 Thread Poul-Henning Kamp
In message [EMAIL PROTECTED], Bruce Evans writes: On Fri, 14 Oct 2005, Poul-Henning Kamp wrote: Even to this day new CPU chips come out where TSC has flaws that prevent it from being used as timecounter, and we do not have (NDA) access to the data that would allow us to build a list of safe

Re: Call for performance evaluation: net.isr.direct (fwd)

2005-10-13 Thread Garrett Wollman
On Wed, 12 Oct 2005 17:17:12 -0400 (EDT), Andrew Gallatin [EMAIL PROTECTED] said: Right now, at least, it seems to work OK. I haven't tried witness, but a non-debug kernel shows a big speedup from enabling it. Do you think there is a chance that it could be made to work in FreeBSD? I did

Re: Call for performance evaluation: net.isr.direct (fwd)

2005-10-13 Thread Andrew Gallatin
Garrett Wollman writes: On Wed, 12 Oct 2005 17:17:12 -0400 (EDT), Andrew Gallatin [EMAIL PROTECTED] said: Right now, at least, it seems to work OK. I haven't tried witness, but a non-debug kernel shows a big speedup from enabling it. Do you think there is a chance that it could

Re: Call for performance evaluation: net.isr.direct (fwd)

2005-10-12 Thread Robert Watson
On Wed, 12 Oct 2005, Andrew Gallatin wrote: Speaking of net.isr, is there any reason why if_simloop() calls netisr_queue() rather than netisr_dispatch()? Yes -- it's basically to prevent recursion for loopback traffic, which can result in both lock orders and general concerns regarding

Re: Call for performance evaluation: net.isr.direct (fwd)

2005-10-12 Thread Andrew Gallatin
Robert Watson writes: On Wed, 12 Oct 2005, Andrew Gallatin wrote: Speaking of net.isr, is there any reason why if_simloop() calls netisr_queue() rather than netisr_dispatch()? Yes -- it's basically to prevent recursion for loopback traffic, which can result in both lock