Ok Nico.. Thank you very much for your explanation.
Humm.. I believe I will start in production by just defining the GPUs as
a consumable resource.. then I'll try to adapt and fix when trouble
arise. :-) In fact I'm new to GPUs and don't know much about it.. I'll
relay to the users who know much more than me on GPUs as they also are
developers of programs for GPU. ;-)
Thank you and best regards.
Robi
Nicolás Serrano Martínez Santos wrote:
This extension seems to add information to be controlled by sensors. You could
employ this information in many ways. For instance, you could change the load
in a host if a gpu is being used to not accept any other processes.
Depending on which are your requirements you might need to install or employ a
custom made shellscript as proposed in the link I previously attach.
What I have not been able to replicate is a variable similar to h_vmem, which
will automatically kill the process if it is reached in some moment. In fact,
in GE the process is launched with only this memory available (I dont know
how this is performed). I think this feature is not implemented in the update
you attached.
Regards,
NiCo
Excerpts from Roberto Nunnari's message of mar may 28 10:54:53 +0200 2013:
Nicolás Serrano Martínez Santos wrote:
As far as I know, there is not much you can do, besides defining a consumable
for each gpu.
http://serverfault.com/questions/322073/howto-set-up-sge-for-cuda-devices
In our university we also have a Tesla in our GE cluster. Tesla lets you to
define several virtual gpu (e.g. 3 or 4 slots). You may find useful to define
a gpu_memory consumable to limit the graphic memory per process, as when the
memory of the graphic card is exhausted, processes crash. However, GE is
not able to (easily) monitor the memory used per process. You can define it in
GE and reserve it when submitting it, but in our case, the memory is monitored
by each process.
If you wish to limit gpu to a queue, I think you can define gpu to be a
consumable also in the queue.
Best Regards,
NiCo
Hi Nico,
Thank you for your answer.
What about this?
https://gridscheduler.svn.sourceforge.net/svnroot/gridscheduler/trunk/source/dist/gpu/gpu_sensor.c
Ever tried it? Do you think it can be useful? What are the advantages
and/or scenarios of using that sensor?
Thank you and best regards.
Robi
Excerpts from Roberto Nunnari's message of mar may 28 10:20:13 +0200 2013:
Anybody on this, please? I'm sorry to insist, but I've posted on 24th
and on 27th but got no answer yet..
Best regards,
Robi
Roberto Nunnari wrote:
Hello.
Anybody on this, please? In the while, I went on a little bit and
implemented it as this:
[root@master ~]# qconf -sc | grep gpu
gpu gpu INT <= YES YES 0 0
[root@master ~]# qhost -F | grep gpu
hc:gpu=1.000000
Now users can access the gpu by specifying 'qsub -l gpu=1'
I haven't defined a specialized queue for the gpu, and I see that when
the job is running on the gpu, the sceduler also reserves a cpu slot on
that host.. that's good because a gpu job will also consume cpu time..
More hints, tips or advices, please?
Thank you and best regards,
Robi
Roberto Nunnari wrote:
Hi all.
I just doing my first tests with Open Grid Scheduler and GPGPU.
To do the testing I set up opengridscheduler on two hosts, one is the
frontend, and one is the execution node. The execution node has 64 cores
and a nvidia tesla M2090. (Most probably, the final solution will be
made up of one master, 20-30 execution nodes with 8-12 cores each, and a
couple of file servers, and 4-8 GPUs attached to some of the execution
nodes)
At present I set up my testing environment queues, similar to the
existing production cluster.. so the scheduler has three queues,
1hour, 1day, and unlimited.
I believe I once was told that the best way to add GPGPUs to a CPU
cluster, was by adding a queue dedicated to the GPUs and consumable
resources.. maybe also play with priorities to use CPUs on hosts with
GPUs.. do you agree?
I also was told that OpenGridScheduler has added support for GPUs..
Could anybody tell me more on that, please?
So.. I'm new to GPUs and would like some help/direction from the experts
on how to make a mixed cluster CPUs/GPUs using OpenGridScheduler.
here's my environment:
- OGS/GE 2011.11p1
- CentOS 6.4
[root@master ~]# qhost -q
HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE SWAPTO
SWAPUS
-------------------------------------------------------------------------------
global - - - - - -
-
node01 linux-x64 64 14.33 94.4G 50.3G 186.3G
0.0
1h.q BIP 0/0/20
1d.q BP 0/0/20
long.q BP 0/0/20
Any help/tips/direction greatly appreciated! :-)
Robi
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users