Hi Yann,
This is a somewhat unusual situation (though many will have come across one
variation or another). Looking harder I think you are doing the right thing,
except it would be better to wait/test for the workers to be up before starting
the client. I presume the 'sleep 5' commands are ther
On 17/11/19 10:39 pm, Yann Bouteiller wrote:
I have thought about it too, but I think the --block option that I use
in ray start is supposed to sleep indefinitely for this not to happen.
However maybe this is not taken into account due to the fact that I use
'&' at this end of the ray start co
Yann Bouteiller writes:
> Hello Brian, thank you for your answer.
>
> Actually, you are not allowed to install things in your home on
> computecanada,
> this is why you need to install everything in a virtualenv with pip
> install. Also, you have to install each virtualenv in $SLURM_TMDIR whi
Hi Gareth, thank you for your answer,
I have thought about it too, but I think the --block option that I use
in ray start is supposed to sleep indefinitely for this not to happen.
However maybe this is not taken into account due to the fact that I
use '&' at this end of the ray start comman
Hi Yann,
The remaining problem may be that the ray processes are not waited on. I'm not
sure, but hope this gets you looking in the right place. You may need to sleep
indefinitely in the scripts that run the worker ray processes then when the
master is finished making them work, cancel the work
Hello Brian, thank you for your answer.
Actually, you are not allowed to install things in your home on
computecanada, this is why you need to install everything in a
virtualenv with pip install. Also, you have to install each virtualenv
in $SLURM_TMDIR which is the local drive of the node,
I suspect when you say "head node" you mean the primary node from the
nodes your were allocated.
Normally, when you use pip as a user, it installs in your home
directory. Are you certain all your nodes share the same homes?
If they are merely synched, that would not be the same. Not actually
s
Hello,
I am trying to do this on computecanada, which is managed by slurm:
https://ray.readthedocs.io/en/latest/deploying-on-slurm.html
However, on computecanada, you cannot install things on nodes before
the job has started, and you can only install things in a python
virtualenv once t