Well there are a number of performance tuning guidelines in dedicated sections of the spark documentation - have you read and applied them
Secondly any performance problem within a distributed cluster environment has two aspects: 1. Infrastructure 2. App Algorithms You seem to be focusing only on 1, but what you said about the performance differences between single laptop and cluster points to potential algorithmic inefficiency in your app when e.g. distributing and performing parallel processing and data. On a single laptop data moves instantly between workers because all worker instances run in the memory of a single machine .. Regards, Evo Eftimov From: Manish Gupta 8 [mailto:mgupt...@sapient.com] Sent: Thursday, April 16, 2015 6:03 PM To: user@spark.apache.org Subject: General configurations on CDH5 to achieve maximum Spark Performance Hi, Is there a document/link that describes the general configuration settings to achieve maximum Spark Performance while running on CDH5? In our environment, we did lot of changes (and still doing it) to get decent performance otherwise our 6 node dev cluster with default configurations, lags behind a single laptop running Spark. Having a standard checklist (taking a base node size of 4-CPU, 16GB RAM) would be really great. Any pointers in this regards will be really helpful. We are running Spark 1.2.0 on CDH 5.3.0. Thanks, Manish Gupta Specialist | Sapient Global Markets Green Boulevard (Tower C) 3rd & 4th Floor Plot No. B-9A, Sector 62 Noida 201 301 Uttar Pradesh, India Tel: +91 (120) 479 5000 Fax: +91 (120) 479 5001 Email: mgupt...@sapient.com sapientglobalmarkets.com The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from any (your) computer. ***Please consider the environment before printing this email.***