Very good - that is pretty much all that the signal_job API does.
On Mar 17, 2010, at 4:11 PM, Leonardo Fialho wrote:
> Anyway, to signal another job I have sent a RML message with the
> ORTE_DAEMON_SIGNAL_LOCAL_PROCS command to the proc's HNP.
>
> Leonardo
>
> On Mar 17, 2010, at 9:59 PM, Ral
Anyway, to signal another job I have sent a RML message with the
ORTE_DAEMON_SIGNAL_LOCAL_PROCS command to the proc's HNP.
Leonardo
On Mar 17, 2010, at 9:59 PM, Ralph Castain wrote:
> Sorry, I was out snowshoeing today - and about 3 miles out, I suddenly
> realized the problem :-/
>
> Terry i
Sorry, I was out snowshoeing today - and about 3 miles out, I suddenly realized
the problem :-/
Terry is correct - we don't initialize the plm framework in application
processes. However, there is a default proxy module for that framework so that
applications can call comm_spawn. Unfortunately,
Yes, I know the difference :)
I'm trying to call orte_plm.signal_job from a PML component. I think PLM stays
resident after launching but it doesn't only for mpirun and orted, you're right.
On Mar 17, 2010, at 3:15 PM, Terry Dontje wrote:
> On 03/17/2010 10:10 AM, Leonardo Fialho wrote:
>>
>>
On 03/17/2010 10:10 AM, Leonardo Fialho wrote:
Wow... orte_plm.signal_job points to zero. Is it correct from the PML
point of view?
It might be because plm's are really only used at launch time not in MPI
processes. Note plm != pml.
--td
Leonardo
On Mar 17, 2010, at 2:52 PM, Leonardo Fialh
Wow... orte_plm.signal_job points to zero. Is it correct from the PML point of
view?
Leonardo
On Mar 17, 2010, at 2:52 PM, Leonardo Fialho wrote:
> To clarify a little bit more: I'm calling orte_plm.signal_job from a PML
> component, I know that ORTE is bellow OMPI, but I think that this funct
Can you print out what orte_plm.signal_job value is? I bet it is
pointing to address 0. So the question is orte_plm actually initialized
in an MPI process? My guess would be no but I am sure Ralph will be
able to answer more definitively.
--td
On 03/17/2010 09:52 AM, Leonardo Fialho wrote:
To clarify a little bit more: I'm calling orte_plm.signal_job from a PML
component, I know that ORTE is bellow OMPI, but I think that this function
could not be available, or something like this. I can't figure out where is
this snprintf too, in my code there is only
opal_output(0, "receive
Thanks for clarifying - guess I won't chew just yet. :-)
I still don't see in your trace where it is failing in signal_job. I didn't see
the message indicating it was sending the signal cmd out in your prior debug
output, and there isn't a printf in that code loop other than the debug output.
C
Ralph don't swallow your message yet... Both jobs are not running over the same
mpirun. There are two instances of mpirun in which one runs with "-report-uri
../contact.txt" and the other receives its contact info using "-ompi-server
file:../contact.txt". And yes, both processes are running with
I'm going to have to eat my last message. It slipped past me that your other
job was started via comm_spawn. Since both "jobs" are running under the same
mpirun, there shouldn't be a problem sending a signal between them.
I don't know why this would be crashing. Are you sure it is crashing in
Well, thank you anyway :)
On Mar 17, 2010, at 1:54 AM, Ralph Castain wrote:
> Yeah, that probably won't work. The current code isn't intended to cross jobs
> like that - I'm sure nobody ever tested it for that idea, and I'm pretty sure
> it won't support it.
>
> I don't currently know any way
Yeah, that probably won't work. The current code isn't intended to cross jobs
like that - I'm sure nobody ever tested it for that idea, and I'm pretty sure
it won't support it.
I don't currently know any way to do what you are trying to do. We could extend
the signal code to handle it, I would
Yes... but something wrong is going on... maybe the problem is that the jobid
is different than the process' jobid, I don't know.
I'm trying to send a signal to other process running under a another job. The
other process jump into an accept_connect to the MPI comm. So i did a code like
this (I
Sure! So long as you add the include, you are okay as the ORTE layer is "below"
the OMPI one.
On Mar 16, 2010, at 6:29 PM, Leonardo Fialho wrote:
> Thanks Ralph, the last question... it orte_plm.signal_job exposed/available
> to be called by a PML component? Yes, I have the orte/mca/plm/plm.h i
Thanks Ralph, the last question... it orte_plm.signal_job exposed/available to
be called by a PML component? Yes, I have the orte/mca/plm/plm.h include line.
Leonardo
On Mar 16, 2010, at 11:59 PM, Ralph Castain wrote:
> It's just the orte_process_name_t jobid field. So if you have an
> orte_pr
It's just the orte_process_name_t jobid field. So if you have an
orte_process_name_t *pname, then it would just be
orte_plm.signal_job(pname->jobid, sig)
On Mar 16, 2010, at 3:23 PM, Leonardo Fialho wrote:
> Hum and to signal a job probably the function is
> orte_plm.signal_job(jobid, sig
Hum and to signal a job probably the function is orte_plm.signal_job(jobid,
signal); right?
Now my dummy question is how to obtain the jobid part from an orte_proc_name_t
variable? Is there any magical function in the names_fns.h?
Thanks,
Leonardo
On Mar 16, 2010, at 10:12 PM, Ralph Castai
Afraid not - you can signal a job, but not a specific process. We used to have
such an API, but nobody ever used it. Easy to restore if someone has a need.
On Mar 16, 2010, at 2:45 PM, Leonardo Fialho wrote:
> Hi,
>
> Is there any function in Open MPI's frameworks to send a signal to other ORTE
On 4/8/08 2:19 PM, "Ralph H Castain" wrote:
>
>
>
> On 4/8/08 12:10 PM, "Pak Lui" wrote:
>
>> Richard Graham wrote:
>>> What happens if I deliver sigusr2 to mpirun ? What I observe (for both
>>> ssh/rsh and torque) that if I deliver a sigusr2 to mpirun, the signal does
>>> get propagated
On 4/8/08 12:10 PM, "Pak Lui" wrote:
> Richard Graham wrote:
>> What happens if I deliver sigusr2 to mpirun ? What I observe (for both
>> ssh/rsh and torque) that if I deliver a sigusr2 to mpirun, the signal does
>> get propagated to the mpi procs, which do invoke the signal handler I
>> regi
Richard Graham wrote:
What happens if I deliver sigusr2 to mpirun ? What I observe (for both
ssh/rsh and torque) that if I deliver a sigusr2 to mpirun, the signal does
get propagated to the mpi procs, which do invoke the signal handler I
registered, but the job is terminated right after that. H
Hmmm...well, I'll take a look. I haven't seen that behavior, but I haven't
checked it in some time.
On 4/8/08 11:54 AM, "Richard Graham" wrote:
> What happens if I deliver sigusr2 to mpirun ? What I observe (for both
> ssh/rsh and torque) that if I deliver a sigusr2 to mpirun, the signal does
What happens if I deliver sigusr2 to mpirun ? What I observe (for both
ssh/rsh and torque) that if I deliver a sigusr2 to mpirun, the signal does
get propagated to the mpi procs, which do invoke the signal handler I
registered, but the job is terminated right after that. However, if I
deliver the
I found what Pak said a little confusing as the wait_daemon function doesn't
actually receive a signal itself - it only detects that a proc has exited
and checks to see if that happened due to a signal. If so, it flags that
situation and will order the job aborted.
So if the proc continues alive,
First, can your user executable create a signal handler to catch the
SIGUSR2 to not exit? By default on Solaris it is going to exit, unless
you catch the signal and have the process to do nothing.
from signal(3HEAD)
Name Value DefaultEvent
SIGUSR1 16 Ex
26 matches
Mail list logo