subject:"Re\: \[OMPI devel\] Signals"

Re: [OMPI devel] Signals

2010-03-17 Thread Ralph Castain

Very good - that is pretty much all that the signal_job API does. On Mar 17, 2010, at 4:11 PM, Leonardo Fialho wrote: > Anyway, to signal another job I have sent a RML message with the > ORTE_DAEMON_SIGNAL_LOCAL_PROCS command to the proc's HNP. > > Leonardo > > On Mar 17, 2010, at 9:59 PM, Ral

Re: [OMPI devel] Signals

2010-03-17 Thread Leonardo Fialho

Anyway, to signal another job I have sent a RML message with the ORTE_DAEMON_SIGNAL_LOCAL_PROCS command to the proc's HNP. Leonardo On Mar 17, 2010, at 9:59 PM, Ralph Castain wrote: > Sorry, I was out snowshoeing today - and about 3 miles out, I suddenly > realized the problem :-/ > > Terry i

Re: [OMPI devel] Signals

2010-03-17 Thread Ralph Castain

Sorry, I was out snowshoeing today - and about 3 miles out, I suddenly realized the problem :-/ Terry is correct - we don't initialize the plm framework in application processes. However, there is a default proxy module for that framework so that applications can call comm_spawn. Unfortunately,

Re: [OMPI devel] Signals

2010-03-17 Thread Leonardo Fialho

Yes, I know the difference :) I'm trying to call orte_plm.signal_job from a PML component. I think PLM stays resident after launching but it doesn't only for mpirun and orted, you're right. On Mar 17, 2010, at 3:15 PM, Terry Dontje wrote: > On 03/17/2010 10:10 AM, Leonardo Fialho wrote: >> >>

Re: [OMPI devel] Signals

2010-03-17 Thread Terry Dontje

On 03/17/2010 10:10 AM, Leonardo Fialho wrote: Wow... orte_plm.signal_job points to zero. Is it correct from the PML point of view? It might be because plm's are really only used at launch time not in MPI processes. Note plm != pml. --td Leonardo On Mar 17, 2010, at 2:52 PM, Leonardo Fialh

Re: [OMPI devel] Signals

2010-03-17 Thread Leonardo Fialho

Wow... orte_plm.signal_job points to zero. Is it correct from the PML point of view? Leonardo On Mar 17, 2010, at 2:52 PM, Leonardo Fialho wrote: > To clarify a little bit more: I'm calling orte_plm.signal_job from a PML > component, I know that ORTE is bellow OMPI, but I think that this funct

Re: [OMPI devel] Signals

2010-03-17 Thread Terry Dontje

Can you print out what orte_plm.signal_job value is? I bet it is pointing to address 0. So the question is orte_plm actually initialized in an MPI process? My guess would be no but I am sure Ralph will be able to answer more definitively. --td On 03/17/2010 09:52 AM, Leonardo Fialho wrote:

Re: [OMPI devel] Signals

2010-03-17 Thread Leonardo Fialho

To clarify a little bit more: I'm calling orte_plm.signal_job from a PML component, I know that ORTE is bellow OMPI, but I think that this function could not be available, or something like this. I can't figure out where is this snprintf too, in my code there is only opal_output(0, "receive

Re: [OMPI devel] Signals

2010-03-17 Thread Ralph Castain

Thanks for clarifying - guess I won't chew just yet. :-) I still don't see in your trace where it is failing in signal_job. I didn't see the message indicating it was sending the signal cmd out in your prior debug output, and there isn't a printf in that code loop other than the debug output. C

Re: [OMPI devel] Signals

2010-03-17 Thread Leonardo Fialho

Ralph don't swallow your message yet... Both jobs are not running over the same mpirun. There are two instances of mpirun in which one runs with "-report-uri ../contact.txt" and the other receives its contact info using "-ompi-server file:../contact.txt". And yes, both processes are running with

Re: [OMPI devel] Signals

2010-03-17 Thread Ralph Castain

I'm going to have to eat my last message. It slipped past me that your other job was started via comm_spawn. Since both "jobs" are running under the same mpirun, there shouldn't be a problem sending a signal between them. I don't know why this would be crashing. Are you sure it is crashing in

Re: [OMPI devel] Signals

2010-03-16 Thread Leonardo Fialho

Well, thank you anyway :) On Mar 17, 2010, at 1:54 AM, Ralph Castain wrote: > Yeah, that probably won't work. The current code isn't intended to cross jobs > like that - I'm sure nobody ever tested it for that idea, and I'm pretty sure > it won't support it. > > I don't currently know any way

Re: [OMPI devel] Signals

2010-03-16 Thread Ralph Castain

Yeah, that probably won't work. The current code isn't intended to cross jobs like that - I'm sure nobody ever tested it for that idea, and I'm pretty sure it won't support it. I don't currently know any way to do what you are trying to do. We could extend the signal code to handle it, I would

Re: [OMPI devel] Signals

2010-03-16 Thread Leonardo Fialho

Yes... but something wrong is going on... maybe the problem is that the jobid is different than the process' jobid, I don't know. I'm trying to send a signal to other process running under a another job. The other process jump into an accept_connect to the MPI comm. So i did a code like this (I

Re: [OMPI devel] Signals

2010-03-16 Thread Ralph Castain

Sure! So long as you add the include, you are okay as the ORTE layer is "below" the OMPI one. On Mar 16, 2010, at 6:29 PM, Leonardo Fialho wrote: > Thanks Ralph, the last question... it orte_plm.signal_job exposed/available > to be called by a PML component? Yes, I have the orte/mca/plm/plm.h i

Re: [OMPI devel] Signals

2010-03-16 Thread Leonardo Fialho

Thanks Ralph, the last question... it orte_plm.signal_job exposed/available to be called by a PML component? Yes, I have the orte/mca/plm/plm.h include line. Leonardo On Mar 16, 2010, at 11:59 PM, Ralph Castain wrote: > It's just the orte_process_name_t jobid field. So if you have an > orte_pr

Re: [OMPI devel] Signals

2010-03-16 Thread Ralph Castain

It's just the orte_process_name_t jobid field. So if you have an orte_process_name_t *pname, then it would just be orte_plm.signal_job(pname->jobid, sig) On Mar 16, 2010, at 3:23 PM, Leonardo Fialho wrote: > Hum and to signal a job probably the function is > orte_plm.signal_job(jobid, sig

Re: [OMPI devel] Signals

2010-03-16 Thread Leonardo Fialho

Hum and to signal a job probably the function is orte_plm.signal_job(jobid, signal); right? Now my dummy question is how to obtain the jobid part from an orte_proc_name_t variable? Is there any magical function in the names_fns.h? Thanks, Leonardo On Mar 16, 2010, at 10:12 PM, Ralph Castai

Re: [OMPI devel] Signals

2010-03-16 Thread Ralph Castain

Afraid not - you can signal a job, but not a specific process. We used to have such an API, but nobody ever used it. Easy to restore if someone has a need. On Mar 16, 2010, at 2:45 PM, Leonardo Fialho wrote: > Hi, > > Is there any function in Open MPI's frameworks to send a signal to other ORTE

Re: [OMPI devel] Signals

2008-04-08 Thread Richard Graham

On 4/8/08 2:19 PM, "Ralph H Castain" wrote: > > > > On 4/8/08 12:10 PM, "Pak Lui" wrote: > >> Richard Graham wrote: >>> What happens if I deliver sigusr2 to mpirun ? What I observe (for both >>> ssh/rsh and torque) that if I deliver a sigusr2 to mpirun, the signal does >>> get propagated

Re: [OMPI devel] Signals

2008-04-08 Thread Ralph H Castain

On 4/8/08 12:10 PM, "Pak Lui" wrote: > Richard Graham wrote: >> What happens if I deliver sigusr2 to mpirun ? What I observe (for both >> ssh/rsh and torque) that if I deliver a sigusr2 to mpirun, the signal does >> get propagated to the mpi procs, which do invoke the signal handler I >> regi

Re: [OMPI devel] Signals

2008-04-08 Thread Pak Lui

Richard Graham wrote: What happens if I deliver sigusr2 to mpirun ? What I observe (for both ssh/rsh and torque) that if I deliver a sigusr2 to mpirun, the signal does get propagated to the mpi procs, which do invoke the signal handler I registered, but the job is terminated right after that. H

Re: [OMPI devel] Signals

2008-04-08 Thread Ralph H Castain

Hmmm...well, I'll take a look. I haven't seen that behavior, but I haven't checked it in some time. On 4/8/08 11:54 AM, "Richard Graham" wrote: > What happens if I deliver sigusr2 to mpirun ? What I observe (for both > ssh/rsh and torque) that if I deliver a sigusr2 to mpirun, the signal does

Re: [OMPI devel] Signals

2008-04-08 Thread Richard Graham

What happens if I deliver sigusr2 to mpirun ? What I observe (for both ssh/rsh and torque) that if I deliver a sigusr2 to mpirun, the signal does get propagated to the mpi procs, which do invoke the signal handler I registered, but the job is terminated right after that. However, if I deliver the

Re: [OMPI devel] Signals

2008-04-08 Thread Ralph H Castain

I found what Pak said a little confusing as the wait_daemon function doesn't actually receive a signal itself - it only detects that a proc has exited and checks to see if that happened due to a signal. If so, it flags that situation and will order the job aborted. So if the proc continues alive,

Re: [OMPI devel] Signals

2008-04-08 Thread Pak Lui

First, can your user executable create a signal handler to catch the SIGUSR2 to not exit? By default on Solaris it is going to exit, unless you catch the signal and have the process to do nothing. from signal(3HEAD) Name Value DefaultEvent SIGUSR1 16 Ex

Re: [OMPI devel] Signals

Re: [OMPI devel] Signals

Re: [OMPI devel] Signals

Re: [OMPI devel] Signals

Re: [OMPI devel] Signals

Re: [OMPI devel] Signals

Re: [OMPI devel] Signals

Re: [OMPI devel] Signals

Re: [OMPI devel] Signals

Re: [OMPI devel] Signals

Re: [OMPI devel] Signals

Re: [OMPI devel] Signals

Re: [OMPI devel] Signals

Re: [OMPI devel] Signals

Re: [OMPI devel] Signals

Re: [OMPI devel] Signals

Re: [OMPI devel] Signals

Re: [OMPI devel] Signals

Re: [OMPI devel] Signals

Re: [OMPI devel] Signals

Re: [OMPI devel] Signals

Re: [OMPI devel] Signals

Re: [OMPI devel] Signals

Re: [OMPI devel] Signals

Re: [OMPI devel] Signals

Re: [OMPI devel] Signals

26 matches

Site Navigation

Mail list logo

Footer information