I don't have a great answer for you. For us, we found a common divisor, not necessarily a whole gigabyte, of the available memory of the different hardware and used that as the amount of memory per worker and scaled the number of cores accordingly so that every core in the system has the same amount of memory. The quotient of the available memory and the common divisor, hopefully a whole number to reduce waste, was the number of workers we spun up. Therefore, if you have 64G, 30G, and 15G available memory on your machines, the divisor could be 15G and you'd have 4, 2 and 1 worker per machine. Every worker on all the machines would have the same number of cores, set to what you think is a good value.
Hope that helps. On Wed, Dec 3, 2014 at 7:44 AM, <kartheek.m...@gmail.com> wrote: > Hi Victor, > > I want to setup a heterogeneous stand-alone spark cluster. I have hardware > with different memory sizes and varied number of cores per node. I could > get all the nodes active in the cluster only when the size of memory per > executor is set as the least available memory size of all nodes and is same > with no.of cores/executor. As of now, I configure one executor per node. > > Can you please suggest some path to set up a stand-alone heterogeneous > cluster such that I can efficiently use the available hardware. > > Thank you > > > > > _____________________________________ > Sent from http://apache-spark-user-list.1001560.n3.nabble.com > >