Re: Performance issue when running Spark-1.6.1 in yarn-client mode with Hadoop 2.6.0

2017-06-08 Thread Satish John Bosco
I have tried the configuration calculator sheet provided by Cloudera as well but no improvements. However, ignoring the 17 mil operation to begin with. Let consider the simple sort on yarn and spark which has tremendous difference. The operation is simple Selected numeric col to be sorted

Re: Performance issue when running Spark-1.6.1 in yarn-client mode with Hadoop 2.6.0

2017-06-06 Thread Jörn Franke
What does your Spark job do? Have you tried standard configurations and changing them gradually? Have you checked the logfiles/ui which tasks take long? 17 Mio records does not sound much, but it depends what you do with it. I do not think that for such a small "cluster" it makes sense to

Performance issue when running Spark-1.6.1 in yarn-client mode with Hadoop 2.6.0

2017-06-06 Thread satishjohn
Performance issue / time taken to complete spark job in yarn is 4 x slower, when considered spark standalone mode. However, in spark standalone mode jobs often fails with executor lost issue. Hardware configuration 32GB RAM 8 Cores (16) and 1 TB HDD 3 (1 Master and 2 Workers) Spark