[OMPI devel] limit tcp fragment size?

2008-03-31 Thread Muhammad Atif
G'day Just a quick basic question. in case of tcp btl, how do I limit the frag size? I do not want MPI to send a fragment of size greater than lets say 16K in size. If I am not mistaken, should not the btl_tcp_min_send_size do the trick? If it is supposed to do it, why do i see packets of

[OMPI devel] segfault on host not found error.

2008-03-31 Thread Lenny Verkhovsky
I accidently run job on the hostfile where one of hosts was not properly mounted. As a result I got an error and a segfault. /home/USERS/lenny/OMPI_ORTE_TRUNK/bin/mpirun -np 29 -hostfile hostfile ./mpi_p01 -t lt bash: /home/USERS/lenny/OMPI_ORTE_TRUNK/bin/orted: No such file or directory -

Re: [OMPI devel] RMAPS rank_file component patch and modifications for review

2008-03-31 Thread Jeff Squyres
On Mar 27, 2008, at 8:02 AM, Lenny Verkhovsky wrote: - I don't think we can delete the MCA param ompi_paffinity_alone; it exists in the v1.2 series and has historical precedent. It will not be deleted, It will just use the same infrastructure ( slot_list parameter and opal_base functions ). It

Re: [OMPI devel] RMAPS rank_file component patch and modifications for review

2008-03-31 Thread Jeff Squyres
Sorry, I missed this mail. IIRC, the verbosity level for stream 0 is 0. It probably would not be good to increase it; many places in the code use output stream 0. Perhaps you could make a new stream with a different verbosity level to do what you want...? See the docs in opal/util/output.

Re: [OMPI devel] RMAPS rank_file component patch and modifications for review

2008-03-31 Thread Terry Dontje
Jeff Squyres wrote: On Mar 27, 2008, at 8:02 AM, Lenny Verkhovsky wrote: - I don't think we can delete the MCA param ompi_paffinity_alone; it exists in the v1.2 series and has historical precedent. It will not be deleted, It will just use the same infrastructure ( slot_list parameter

Re: [OMPI devel] RMAPS rank_file component patch and modifications for review

2008-03-31 Thread Lenny Verkhovsky
OK, I am putting it back. > -Original Message- > From: terry.don...@sun.com [mailto:terry.don...@sun.com] > Sent: Monday, March 31, 2008 2:59 PM > To: Open MPI Developers > Cc: Lenny Verkhovsky; Sharon Melamed > Subject: Re: [OMPI devel] RMAPS rank_file component patch and > modificatio

Re: [OMPI devel] Scalability of openib modex

2008-03-31 Thread Ralph H Castain
Thanks Jeff. It appears to me that the first approach to reducing modex data makes the most sense and has the largest impact - I would advocate pursuing it first. We can look at further refinements later. Along that line, one thing we also exchange in the modex (not IB specific) is hostname and ar

Re: [OMPI devel] Scalability of openib modex

2008-03-31 Thread Jeff Squyres
On Mar 31, 2008, at 9:22 AM, Ralph H Castain wrote: Thanks Jeff. It appears to me that the first approach to reducing modex data makes the most sense and has the largest impact - I would advocate pursuing it first. We can look at further refinements later. Along that line, one thing we also

[OMPI devel] Routed 'unity' broken on trunk

2008-03-31 Thread Josh Hursey
Ralph, I've just noticed that it seems that the 'unity' routed component seems to be broken when using more than one machine. I'm using Odin and r18028 of the trunk, and have confirmed that this problem occurs with SLURM and rsh. I think this break came in on Friday as that is when some o

Re: [OMPI devel] Routed 'unity' broken on trunk

2008-03-31 Thread Ralph H Castain
I figured out the issue - there is a simple and a hard way to fix this. So before I do, let me see what makes sense. The simple solution involves updating the daemons with contact info for the procs so that they can send their collected modex info to the rank=0 proc. This will measurably slow the

Re: [OMPI devel] Routed 'unity' broken on trunk

2008-03-31 Thread Josh Hursey
At the moment I only use unity with C/R. Mostly because I have not verified that the other components work properly under the C/R conditions. I can verify others, but that doesn't solve the problem with the unity component. :/ It is not critical that these jobs launch quickly, but that they

Re: [OMPI devel] limit tcp fragment size?

2008-03-31 Thread George Bosilca
The btl_tcp_min_send_size is not exactly what you expect it to be. It drive only the send protocol (as implemented in Open MPI), and not the put protocol the TCP BTL is using. You can achieve what you want with 2 parameters: 1. btl_tcp_frag set to 9. This will force the send protocol over TCP

Re: [OMPI devel] Routed 'unity' broken on trunk

2008-03-31 Thread Ralph H Castain
On 3/31/08 9:28 AM, "Josh Hursey" wrote: > At the moment I only use unity with C/R. Mostly because I have not > verified that the other components work properly under the C/R > conditions. I can verify others, but that doesn't solve the problem > with the unity component. :/ > > It is not cri

Re: [OMPI devel] Routed 'unity' broken on trunk

2008-03-31 Thread Josh Hursey
On Mar 31, 2008, at 12:57 PM, Ralph H Castain wrote: On 3/31/08 9:28 AM, "Josh Hursey" wrote: At the moment I only use unity with C/R. Mostly because I have not verified that the other components work properly under the C/R conditions. I can verify others, but that doesn't solve the probl

Re: [OMPI devel] Routed 'unity' broken on trunk

2008-03-31 Thread Ralph H Castain
Okay - fixed with r18040 Thanks Ralph On 3/31/08 11:01 AM, "Josh Hursey" wrote: > > On Mar 31, 2008, at 12:57 PM, Ralph H Castain wrote: > >> >> >> >> On 3/31/08 9:28 AM, "Josh Hursey" wrote: >> >>> At the moment I only use unity with C/R. Mostly because I have not >>> verified that the

Re: [OMPI devel] segfault on host not found error.

2008-03-31 Thread Ralph H Castain
I am unable to replicate the segfault. However, I was able to get the job to hang. I fixed that behavior with r18044. Perhaps you can test this again and let me know what you see. A gdb stack trace would be more helpful. Thanks Ralph On 3/31/08 5:13 AM, "Lenny Verkhovsky" wrote: > > > > I

Re: [OMPI devel] Routed 'unity' broken on trunk

2008-03-31 Thread Josh Hursey
Looks good. Thanks for the fix. Cheers, Josh On Mar 31, 2008, at 1:43 PM, Ralph H Castain wrote: Okay - fixed with r18040 Thanks Ralph On 3/31/08 11:01 AM, "Josh Hursey" wrote: On Mar 31, 2008, at 12:57 PM, Ralph H Castain wrote: On 3/31/08 9:28 AM, "Josh Hursey" wrote: At the mo

[OMPI devel] Session directories in $HOME?

2008-03-31 Thread Josh Hursey
So does anyone know why the session directories are in $HOME instead of /tmp? I'm using r18044 and every time I run the session directories are created in $HOME. George does this have anything to do with your commits from earlier? -- Josh

Re: [OMPI devel] Session directories in $HOME?

2008-03-31 Thread George Bosilca
I looked over the code and I don't see any problems with the changes. The only think I did is replacing the getenv("HOME") by opal_home_directory ... Here is the logic for selecting the TMP directory: if( NULL == (str = getenv("TMPDIR")) ) if( NULL == (str = getenv("TEMP")) )

Re: [OMPI devel] Session directories in $HOME?

2008-03-31 Thread Shipman, Galen M.
Slightly OT but along the same lines.. We currently have an argument to mpirun to set the HNP tmpdir (-- tmpdir). Why don't we have an mca param to set the tmpdir for all the orted's and such? - Galen On Mar 31, 2008, at 3:51 PM, George Bosilca wrote: I looked over the code and I don't se

Re: [OMPI devel] Session directories in $HOME?

2008-03-31 Thread Josh Hursey
Nope. None of those environment variables are defined. Should they be? It would seem that the last part of the logic should be (re-) extended to use /tmp if it exists. -- Josh On Mar 31, 2008, at 3:51 PM, George Bosilca wrote: I looked over the code and I don't see any problems with the cha

Re: [OMPI devel] Session directories in $HOME?

2008-03-31 Thread Jeff Squyres
I confirm that this is new behavior. Session directories have just started showing up in my $HOME as well, and TMPDIR, TEMP, TMP have never been set on my cluster (for interactive logins, anyway). On Mar 31, 2008, at 4:01 PM, Josh Hursey wrote: Nope. None of those environment variables ar

Re: [OMPI devel] Session directories in $HOME?

2008-03-31 Thread Josh Hursey
Taking a quick look at the commits it seems that r18037 looks like the most likely cause of this problem. Previously the session directory was forced to "/tmp" if no environment variables were set. This revision removes this logic and uses the opal_tmp_directory(). Though I agree with this

Re: [OMPI devel] Session directories in $HOME?

2008-03-31 Thread George Bosilca
TMPDIR and TMP are standard on Unix. If they are not defined ... one cannot guess where the temporary files should be located. Unfortunately, if we start using the /tmp directly we might make the wrong guess. What mktemp is returning on your system ? george. On Mar 31, 2008, at 4:01 PM,

Re: [OMPI devel] Session directories in $HOME?

2008-03-31 Thread Ralph H Castain
Here is the problem - the following code was changed in session_dir.c: -#ifdef __WINDOWS__ -#define OMPI_DEFAULT_TMPDIR "C:\\TEMP" -#else -#define OMPI_DEFAULT_TMPDIR "/tmp" -#endif - #define OMPI_PRINTF_FIX_STRING(a) ((NULL == a) ? "(null)" : a) / @@ -262,14 +257,8

Re: [OMPI devel] Session directories in $HOME?

2008-03-31 Thread Aurélien Bouteiller
I more than agree with Galen. Aurelien Le 31 mars 08 à 16:00, Shipman, Galen M. a écrit : Slightly OT but along the same lines.. We currently have an argument to mpirun to set the HNP tmpdir (-- tmpdir). Why don't we have an mca param to set the tmpdir for all the orted's and such? - Galen O

Re: [OMPI devel] Session directories in $HOME?

2008-03-31 Thread George Bosilca
Commit r18046 restore exactly the same logic as it was before r18037. It redirects everything to /tmp is no special environment variable is set. george. On Mar 31, 2008, at 4:09 PM, Josh Hursey wrote: Taking a quick look at the commits it seems that r18037 looks like the most likely cause

Re: [OMPI devel] Session directories in $HOME?

2008-03-31 Thread Josh Hursey
Thanks for the fix. Cheers, Josh On Mar 31, 2008, at 4:17 PM, George Bosilca wrote: Commit r18046 restore exactly the same logic as it was before r18037. It redirects everything to /tmp is no special environment variable is set. george. On Mar 31, 2008, at 4:09 PM, Josh Hursey wrote: T

Re: [OMPI devel] [OMPI svn] svn:open-mpi r18046

2008-03-31 Thread Bert Wesarg
On Mon, Mar 31, 2008 at 10:15 PM, wrote: > Author: bosilca > Date: 2008-03-31 16:15:49 EDT (Mon, 31 Mar 2008) > New Revision: 18046 > URL: https://svn.open-mpi.org/trac/ompi/changeset/18046 > > Modified: trunk/opal/util/opal_environ.c > +#ifdef __WINDOWS__ > +#define OMPI_DEFAULT_TMPDIR "C:

Re: [OMPI devel] [OMPI svn] svn:open-mpi r18046

2008-03-31 Thread George Bosilca
You're right ... I'll make the change asap. Thanks, george. On Mar 31, 2008, at 5:39 PM, Bert Wesarg wrote: On Mon, Mar 31, 2008 at 10:15 PM, wrote: Author: bosilca Date: 2008-03-31 16:15:49 EDT (Mon, 31 Mar 2008) New Revision: 18046 URL: https://svn.open-mpi.org/trac/ompi/changeset/18