Re: [OMPI devel] openib btl_openib_async_thread poll question

2010-12-21 Thread Shamis, Pavel
According to man pages, only POLLIN or Errors maybe returned in the specific case: The bits returned in revents can include any of those specified in events, or one of the values POLLERR, POLLHUP, or POLLNVAL. (These three bits are meaningless in the events field, and will be set in the reven

Re: [OMPI devel] Datatype question

2010-12-21 Thread George Bosilca
I have a big patch pending, that will map all ompi types, and therefore all OP directly into OPAL ddt. The OMPI DDT part is completed, but I have some troubles with the ops. At this point I'm looking into the .m4 files for some help with the mapping between Fortran types directly into the POSIX

[OMPI devel] Datatype question

2010-12-21 Thread Barrett, Brian W
All - I'm trying to follow up on James Dinan's one-sided datatype errors e-mail and running into some datatype issues from when the datatype engine was moved to OPAL (sigh). Accumulate needs to get at the underlying datatypes for a user-created dataype. Before the ddt move, one just walked bd

Re: [OMPI devel] openib btl_openib_async_thread poll question

2010-12-21 Thread Terry Dontje
After further inspection I saw that events is being set to POLLIN only. Is that suppose to mask out any other bits from being set (like POLLRDNORM)? --td On 12/21/2010 10:35 AM, Terry Dontje wrote: We're doing some testing with openib btl on a system with Solaris. It looks like Solaris can r

Re: [OMPI devel] Connect/Accept and Disconnect

2010-12-21 Thread Ralph Castain
You could try configuring with --enable-debug and then set -mca dpm_base_verbose 5 on the cmd line of your two jobs that are trying to connect. Will provide some hopefully useful debug info. BTW: how did you configure OMPI? On Dec 21, 2010, at 7:33 AM, Suraj Prabhakaran wrote: > > On 12/21/2

[OMPI devel] openib btl_openib_async_thread poll question

2010-12-21 Thread Terry Dontje
We're doing some testing with openib btl on a system with Solaris. It looks like Solaris can return POLLIN|POLLRDNORM in revents from a poll call. I looked at the manpages for Linux and it reads like Linux could possibly do this too. However the code in btl_openib_async_thread that checks fo

Re: [OMPI devel] Connect/Accept and Disconnect

2010-12-21 Thread Suraj Prabhakaran
On 12/21/2010 03:12 PM, Ralph Castain wrote: Are you using ompi-server for pub/sub, or just letting it default to mpirun? You might want to output the return value from lookup_name and publish_name to see if they match. If they are different, then you will definitely hang. I used ompi-serv

Re: [OMPI devel] Connect/Accept and Disconnect

2010-12-21 Thread Ralph Castain
Are you using ompi-server for pub/sub, or just letting it default to mpirun? You might want to output the return value from lookup_name and publish_name to see if they match. If they are different, then you will definitely hang. On Dec 21, 2010, at 6:41 AM, Suraj Prabhakaran wrote: > Hello, >

[OMPI devel] Connect/Accept and Disconnect

2010-12-21 Thread Suraj Prabhakaran
Hello, This is basically a repost of my previous mail regarding problems with connect/accept and disconnect (**this is not related to spawning, parent/child**). I *sometimes* find processes blocking indefinitely at Connect/Accept calls or at Disconnect calls. I have an example below. Process