Hi Alex, take a look here : https://blogs.aws.amazon.com/bigdata/post/Tx3RD6EISZGHQ1C/The-Impact-of-Using-Latest-Generation-Instances-for-Your-Amazon-EMR-Job <https://blogs.aws.amazon.com/bigdata/post/Tx3RD6EISZGHQ1C/The-Impact-of-Using-Latest-Generation-Instances-for-Your-Amazon-EMR-Job>
Basically it depends of your type of workload. Will you need Cache ? Jorge Machado www.jmachado.me > On 23/02/2016, at 15:49, Alex Dzhagriev <dzh...@gmail.com> wrote: > > Hello all, > > Can someone please advise me on the pros and cons on how to allocate the > resources: many small heap machines with 1 core or few machines with big > heaps and many cores? I'm sure that depends on the data flow and there is no > best practise solution. E.g. with bigger heap I can perform map-side join > with bigger table. What other considerations should I keep in mind in order > to choose the right configuration? > > Thanks, Alex.