I am trying to benchmark spark in a hadoop cluster. I need to design a sample spark job to test the CPU utilization, RAM usage, Input throughput, Output throughput and Duration of execution in the cluster.
I need to test the state of the cluster for :- A spark job which uses high CPU A spark job which uses high RAM A spark job which uses high Input throughput A spark job which uses high Output throughput A spark job which takes long time. These have to be tested individually and a combination of these scenarios would also be used. Please help me in understanding the factors of a Spark Job which would contribute to CPU utilization, RAM usage, Input throughput, Output throughput, Duration of execution in the cluster. So that I can design spark jobs that could be used for testing. Thanks Shalish.