Re: [OMPI devel] Comm_spawn limits

2008-10-27 Thread Ralph Castain
Well, since I'm the "guy who wrote the code", I'll offer my $0.0001 (my dollars went the way of the market...). Jeff's memory about why we went to 16 bits isn't quite accurate. The fact is that we always had 32-bit jobids, and still do. Up to about a year ago, all of that space was availabl

Re: [OMPI devel] Comm_spawn limits

2008-10-27 Thread Jeff Squyres
On Oct 27, 2008, at 5:52 PM, Andreas Schäfer wrote: I don't know any implementation details, but is making a 16-bit counter a 32-bit counter really so much harder than this fancy (overengineered? ;-) ) table construction? The way I see it, this table which might become a real mess if there are m

Re: [OMPI devel] Comm_spawn limits

2008-10-27 Thread Andreas Schäfer
I don't know any implementation details, but is making a 16-bit counter a 32-bit counter really so much harder than this fancy (overengineered? ;-) ) table construction? The way I see it, this table which might become a real mess if there are multiple MPI_Comm_spawn issued simultaneously in differe

Re: [OMPI devel] Comm_spawn limits

2008-10-27 Thread Jeff Squyres
How about a variation on that idea: keep a global bitmap or some other kind of "this ID is in use" table. Hence, if the launch counter rolls over, you can simply check the table to find a free value. That way, you can be sure to never re-use a value that is still being used. So we'd have

Re: [OMPI devel] Comm_spawn limits

2008-10-22 Thread Ralph Castain
I can't swear to this because I haven't fully grokked it yet, but I believe the answer is: 1. if child jobs have completed, it won't hurt. I think the various subsystem cleanup their bookkeeping when a job completes, so we could possibly reuse the number. Might be some race conditions we wo

Re: [OMPI devel] Comm_spawn limits

2008-10-22 Thread George Bosilca
What's happened if we roll around with the counter ? george. On Oct 22, 2008, at 2:49 PM, Ralph Castain wrote: There recently was activity on the mailing lists where someone was attempting to call comm_spawn 100,000 times. Setting aside the threading issues that were the focus of that exc