Re: [OMPI users] Performance tuning: focus on latency

2007-07-25 Thread Peter Kjellstrom
On Wednesday 25 July 2007, Jeff Squyres wrote: > On Jul 25, 2007, at 7:45 AM, Biagio Cosenza wrote: > > Jeff, I did what you suggested > > > > However no noticeable changes seem to happen. Same peaks and same > > latency times. > > Ok. This suggests that Nagle may not be the issue here. My guess

Re: [OMPI users] Performance tuning: focus on latency

2007-07-25 Thread Jeff Squyres
On Jul 23, 2007, at 8:53 PM, Jeff Squyres wrote: It looks like we enable Nagle right when TCP BTL connections are made. Surprisingly, it looks like we don't have a run-time option to turn it off for power-users like you who want to really tweak around. I should note that I got the logic backw

Re: [OMPI users] Performance tuning: focus on latency

2007-07-25 Thread Jeff Squyres
On Jul 25, 2007, at 7:45 AM, Biagio Cosenza wrote: Jeff, I did what you suggested However no noticeable changes seem to happen. Same peaks and same latency times. Ok. This suggests that Nagle may not be the issue here. Is the code tightly coupled? If so, this could be normal operating

Re: [OMPI users] Performance tuning: focus on latency

2007-07-25 Thread Biagio Cosenza
Jeff, I did what you suggested However no noticeable changes seem to happen. Same peaks and same latency times. Are you sure that for disabling the Nagle's algorithm is needed just changing optval to 0? I saw that, in btl_tcp_endpoint.c, the optval assignement is inside a #if defined(TCP_NODELAY

Re: [OMPI users] Performance tuning: focus on latency

2007-07-23 Thread Jeff Squyres
On Jul 23, 2007, at 6:43 AM, Biagio Cosenza wrote: I'm working on a parallel real time renderer: an embarassing parallel problem where latency is the threshold to high perfomance. Two observations: 1) I did a simple "ping-pong" test (the master does a Bcast + an IRecv for each node + a Wai

[OMPI users] Performance tuning: focus on latency

2007-07-23 Thread Biagio Cosenza
Hello, I'm working on a parallel real time renderer: an embarassing parallel problem where latency is the threshold to high perfomance. Two observations: 1) I did a simple "ping-pong" test (the master does a Bcast + an IRecv for each node + a Waitall) similar to effective renderer workload. Usin