Hi Mark,
I'll respond via separate e-mail to you because I do not want to misuse
this mailing list for too much commercial messaging. On behalf of other
readers who may have similar requirements I would, however, like to
provide a brief overview of Univa's product line-up and how it relates
to your requirements:
* On top of Univa Grid Engine (which the Sun Grid Engine team now
working for Univa has evolved over the past 5 years) we offer
Universal Resource Broker (URB) as an add-on. It allows you to run
"frameworks" such as the Jenkins Mesos framework on top of Univa
Grid Engine. This gives you the flexibility and dynamicity of these
frameworks while providing full Univa Grid Engine policy control and
the ability to mix and match diverse workloads (inside and outside
such frameworks)
* You can also consider direct Grid Engine Jenkins integrations like
the one John McGhee has pointed out
* In May/June we are going to release Univa Grid Engine Container
Edition which will allow you to run Docker containers as a
first-class workload in a Grid Engine cluster
* We also provide an enhanced version of Kubernetes called Navops
(navops.io) which augments Google's Kubernetes with sophisticated
policy management derived from our scheduling IP. Navops is targeted
towards micro-service architectures. Sharing resources between a
Navops and Univa Grid Engine environment will also be possible to
allow for blending of micro-service and more traditional workloads
* If you have a large amount of workload tasks with very short runtime
like is typical for certain test use cases then Univa Short Jobs
might be worth a look. It allows to run extreme throughput workloads
with high efficiency on top of Univa Grid Engine. Tasks can have
run-times down to a few milliseconds and you can run 20,000 and more
tasks per second even in a relatively small cluster
* All products can run inside of VMs or on cloud nodes and we have a
product call UniCloud which can flex cluster sizes dynamically or
support automated cloud bursting capability. It seems you are
covering part of this with your use of Vagrant + Ansible, however
Hope this helps and if there are questions of generic interest then we
can certainly discuss them here. I will be in touch with you directly
for anything else.
Cheers,
Fritz
Dr. Mark Asbach schrieb:
Hi S(o)GE users,
I need some advice :-)
During my Ph.D. times, I discovered Sun Grid Engine and used it to run
distributed machine learning jobs on a (then) medium sized cluster (96 CPUs). I
liked it. Now, a couple of years later, I am again looking for a scheduling and
resource allocation system like SGE for a similar purpose. Unfortunately, SGE
seems to be pretty dead. In addition, I have similar but not identical needs
stemming from continuous integration and from running (micro-)web services.
Ideally, I would like a simple, integrated solution and not a complex monster
built from many large parts.
Here's what I'm trying to accomplish:
- Run custom jobs for machine learning / data analysis. When I have an idea, I
write a job and run it. Usually, the same job is only run a few times. Jobs will
span multiple hosts and might require OpenMP + MPI. This is where SGE was really
good in the past. The crowd seems to have shifted to run everything on Hadoop
although this setup would be really ineffective for my purposes. I usually just
need a couple of CPUs (< 100).
- Run frequent identical jobs for continous integration. We have a Jenkins
running, but it is lacking in some regards. Resource allocation and scheduling
is more or less non-existent. For example, I cannot define resources for things
like attached mobile devices that can be used only by one job of a multi-core
Mac at the same time. These are things already solved with SGE, but SGE itself
does not cover the main aspects of CI, i.e. the collection and analysis of the
build data.
- Run (micro-)services. We have a couple of services that need run
continuously. Some need to be scaled up and down regarding the number of
parallel instances. This is where people are now using Docker and (also quite
complex) resource allocation and scheduling systems like kubernetes.
All three sorts of tasks compete for the same resources and suffer the same
problem of provisioning/configuring the workers to fulfill a job's
requirements. We're using Vagrant + ansible to provision VMs for our machine
learning tasks and I would like to extend this to the other problems as well.
The resource allocation is still somewhat manual in our case. I would really
like to cut down the complexity of our setup.
It would be great if you can point to me any helpful information, ideas,
projects that could help me solve this.
Best,
Mark
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users
--
UnivaFritz Ferstl | CTO and Business Development, EMEA
Univa Corporation <http://www.univa.com/> | The Data Center Optimization
Company
E-Mail: [email protected] | Mobile: +49.170.819.7390
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users