[OMPI devel] [PATCH] fix mx btl_bandwidth

2010-09-03 Thread Brice Goglin
For some reason, the MX btl sets btl_bandwidth in megabits/s instead of megabytes/s. So we get crazy btl_weights in case of heterogeneous multirail. And --mca btl_mx_bandwidth cannot work around the problem (it probably doesn't help because it's overriden by the runtime link width detection anyway

Re: [OMPI devel] [PATCH] fix mx btl_bandwidth

2010-09-03 Thread Jeff Squyres
Thanks; committed in r23712. Can you file CMRs for 1.4 and 1.5? Thanks. On Sep 3, 2010, at 3:53 AM, Brice Goglin wrote: > For some reason, the MX btl sets btl_bandwidth in megabits/s instead of > megabytes/s. So we get crazy btl_weights in case of heterogeneous > multirail. And --mca btl_mx_ba

Re: [OMPI devel] OMPI 1.5 twitter notification plugin probably broken by switch to OAUTH

2010-09-03 Thread Jeff Squyres
On Sep 1, 2010, at 7:15 AM, Chris Samuel wrote: > Looking at the code for the Twitter notifier in OMPI 1.5 > and seeing its use of HTTP basic authentication I would > suggest that it may be non-functional due to Twitters > switch to purely OAUTH based authentication for their API. Oy; I got that

Re: [OMPI devel] openib btl - fatal errors don't abort the job

2010-09-03 Thread Jeff Squyres
On Sep 1, 2010, at 4:47 PM, Steve Wise wrote: > I was wondering what the logic is behind allowing an MPI job to continue in > the presence of a fatal qp error? It's a feature...? > Note the "will try to continue" sentence: > > ---

Re: [OMPI devel] 1/4/3rc1 over MX

2010-09-03 Thread Jeff Squyres
On Sep 1, 2010, at 9:10 AM, Scott Atchley wrote: > I posted a patch for this on the ticket. Will someone be committing this to SVN? I re-opened the ticket because just posting a patch to the ticket doesn't actually fix anything. :-) -- Jeff Squyres jsquy...@cisco.com For corporate legal info

Re: [OMPI devel] 1.5rc5 over MX

2010-09-03 Thread Jeff Squyres
Ditto for the v1.5 patch -- it wasn't committed anywhere and no CMR was filed, so I re-opened the ticket. Plus you mentioned a 2us (!) latency increase. Doesn't that need attention, too? On Sep 1, 2010, at 9:09 AM, Scott Atchley wrote: > Jeff, > > I posted a patch on the ticket. > > Scott

Re: [OMPI devel] 1/4/3rc1 over MX

2010-09-03 Thread Scott Atchley
On Sep 3, 2010, at 8:19 AM, Jeff Squyres wrote: > On Sep 1, 2010, at 9:10 AM, Scott Atchley wrote: > >> I posted a patch for this on the ticket. > > Will someone be committing this to SVN? > > I re-opened the ticket because just posting a patch to the ticket doesn't > actually fix anything. :

Re: [OMPI devel] 1.5rc5 over MX

2010-09-03 Thread Scott Atchley
Shouldn't the regression be a separate ticket since it is unrelated? Scott On Sep 3, 2010, at 8:20 AM, Jeff Squyres wrote: > Ditto for the v1.5 patch -- it wasn't committed anywhere and no CMR was > filed, so I re-opened the ticket. > > Plus you mentioned a 2us (!) latency increase. Doesn't t

Re: [OMPI devel] 1/4/3rc1 over MX

2010-09-03 Thread Abhishek Kulkarni
On Fri, 3 Sep 2010, Jeff Squyres wrote: On Sep 1, 2010, at 9:10 AM, Scott Atchley wrote: I posted a patch for this on the ticket. Will someone be committing this to SVN? Done. Filed the CMRs to get this moved to 1.4.3 and 1.5. I re-opened the ticket because just posting a patch to the

Re: [OMPI devel] [PATCH] fix mx btl_bandwidth

2010-09-03 Thread George Bosilca
Jeff, I think you will have to revert this patch as the btl_bandwidth __IS__ supposed to be in Mbs and not MBs. We usually talk about networks in Mbs (there is a pattern in Ethernet 1G/10G, Myricom 10G). In addition the original design of the multi-rail was based on this assumption, and the mul

Re: [OMPI devel] [PATCH] fix mx btl_bandwidth

2010-09-03 Thread Jeff Squyres
On Sep 3, 2010, at 9:38 AM, George Bosilca wrote: > I think you will have to revert this patch as the btl_bandwidth __IS__ > supposed to be in Mbs and not MBs. We usually talk about networks in Mbs > (there is a pattern in Ethernet 1G/10G, Myricom 10G). This is why I shouldn't commit patches fo

Re: [OMPI devel] [PATCH] fix mx btl_bandwidth

2010-09-03 Thread Brice Goglin
Le 03/09/2010 15:38, George Bosilca a écrit : > Jeff, > > I think you will have to revert this patch as the btl_bandwidth __IS__ > supposed to be in Mbs and not MBs. We usually talk about networks in Mbs > (there is a pattern in Ethernet 1G/10G, Myricom 10G). In addition the > original design of

Re: [OMPI devel] [PATCH] fix mx btl_bandwidth

2010-09-03 Thread George Bosilca
On Sep 3, 2010, at 09:50 , Brice Goglin wrote: > Le 03/09/2010 15:38, George Bosilca a écrit : >> Jeff, >> >> I think you will have to revert this patch as the btl_bandwidth __IS__ >> supposed to be in Mbs and not MBs. We usually talk about networks in Mbs >> (there is a pattern in Ethernet 1G

Re: [OMPI devel] 1.5rc5 over MX

2010-09-03 Thread Jeff Squyres
Yes, probably so. On Sep 3, 2010, at 8:53 AM, Scott Atchley wrote: > Shouldn't the regression be a separate ticket since it is unrelated? > > Scott > > On Sep 3, 2010, at 8:20 AM, Jeff Squyres wrote: > >> Ditto for the v1.5 patch -- it wasn't committed anywhere and no CMR was >> filed, so I r

Re: [OMPI devel] 1.5rc5 has been posted

2010-09-03 Thread Larry Baker
Using MPI-2 (Gropp et al.) says MPI_SIZEOF() only supports numeric intrinsic data types. So, I patched OpenMPI 1.4.2 to remove the declarations of the undefined Logical and Character specific procedures in ompi/mpi/f90/scripts/mpi-f90-interfaces.h.sh: output_197 MPI_Sizeof ${rank} CH "c

Re: [OMPI devel] [PATCH] fix mx btl_bandwidth

2010-09-03 Thread Bogdan Costescu
On Fri, Sep 3, 2010 at 3:47 PM, Jeff Squyres wrote: > It might be worth having even a Linux-specific way to auto-detect, just for > this use case (which is becoming more common -- 1GB LOM and a 10GB non-iWARP > NIC). The file: /sys/class/net/ethX/speed should contain the current speed and is