I have tried the configuration calculator sheet provided by Cloudera as
well but no improvements. However, ignoring the 17 mil operation to begin
with.

Let consider the simple sort on yarn and spark which has tremendous
difference.

The operation is simple Selected numeric col to be sorted ascending and
below is the results.

> 136 seconds - Yarn-client mode
> 40 seconds  - Spark Standalone mode

Can you guide me on having a simple yarn-site.xml configuration that should
be the bare minimum for the below hardware at least. So that I can see if I
am missing or overlooked any key configurations . Also if running in spark
Standalone mode the configuration of spark-env.sh and spark-defaults as to
how many instances to choose with memory and cores.

32GB RAM 8 Cores (16) and 1 TB HDD  3 (1 Master and 2 Workers)

Finally this key is mystifying as to why it created such performance
difference in spark 1.6.1 is not understood spark.sql.
autoBroadcastJoinThreshold::-1.





On Wed, Jun 7, 2017 at 11:16 AM, Jörn Franke <jornfra...@gmail.com> wrote:

> What does your Spark job do? Have you tried standard configurations and
> changing them gradually?
>
> Have you checked the logfiles/ui which tasks  take long?
>
> 17 Mio records does not sound much, but it depends what you do with it.
>
> I do not think that for such a small "cluster" it makes sense to have a
> special scheduling configuration.
>
> > On 6. Jun 2017, at 18:02, satishjohn <satish.johnbo...@gmail.com> wrote:
> >
> > Performance issue / time taken to complete spark job in yarn is 4 x
> slower,
> > when considered spark standalone mode. However, in spark standalone mode
> > jobs often fails with executor lost issue.
> >
> > Hardware configuration
> >
> >
> > 32GB RAM 8 Cores (16) and 1 TB HDD  3 (1 Master and 2 Workers)
> >
> > Spark configuration:
> >
> >
> > spark.executor.memory 7g
> > Spark cores Max 96
> > Spark driver 5GB
> > spark.sql.autoBroadcastJoinThreshold::-1 (Without this key the job
> fails or
> > job takes 50x times more time)
> > spark.driver.maxResultSize::2g
> > spark.driver.memory::5g
> > No of Instances 4 per machine.
> >
> > With the above spark configuration the spark job for the business flow
> of 17
> > million records completes in 8 Minutes.
> >
> > Problem Area:
> >
> >
> > When run in yarn client mode with the below configuration which takes 33
> to
> > 42 minutes to run the same flow. Below is the yarn-site.xml configuration
> > data.
> >
> > <configuration>
> >  <property><name>yarn.label.enabled</name><value>true</value></property>
> >
> > <property><name>yarn.log-aggregation.enable-local-
> cleanup</name><value>false</value></property>
> >
> > <property><name>yarn.resourcemanager.scheduler.
> client.thread-count</name><value>64</value></property>
> >
> > <property><name>yarn.resourcemanager.resource-
> tracker.address</name><value>satish-NS1:8031</value></property>
> >
> > <property><name>yarn.resourcemanager.scheduler.
> address</name><value>satish-NS1:8030</value></property>
> >
> > <property><name>yarn.dispatcher.exit-on-error</
> name><value>true</value></property>
> >
> > <property><name>yarn.nodemanager.container-manager.
> thread-count</name><value>64</value></property>
> >
> > <property><name>yarn.nodemanager.local-dirs</name><
> value>/home/satish/yarn</value></property>
> >
> > <property><name>yarn.nodemanager.localizer.fetch.
> thread-count</name><value>20</value></property>
> >
> > <property><name>yarn.resourcemanager.address</name>
> <value>satish-NS1:8032</value></property>
> >
> > <property><name>yarn.scheduler.increment-allocation-mb</name><value>
> 512</value></property>
> >
> > <property><name>yarn.log.server.url</name><value>http:/
> /satish-NS1:19888/jobhistory/logs</value></property>
> >
> > <property><name>yarn.nodemanager.resource.memory-
> mb</name><value>28000</value></property>
> >
> > <property><name>yarn.nodemanager.labels</name><value>MASTER</value></
> property>
> >
> > <property><name>yarn.nodemanager.resource.cpu-
> vcores</name><value>48</value></property>
> >
> > <property><name>yarn.scheduler.minimum-allocation-
> mb</name><value>1024</value></property>
> >
> > <property><name>yarn.log-aggregation-enable</name><
> value>true</value></property>
> >
> > <property><name>yarn.nodemanager.localizer.client.
> thread-count</name><value>20</value></property>
> >
> > <property><name>yarn.app.mapreduce.am.labels</name><
> value>CORE</value></property>
> >
> > <property><name>yarn.log-aggregation.retain-seconds</
> name><value>172800</value></property>
> >
> > <property><name>yarn.nodemanager.address</name><
> value>${yarn.nodemanager.hostname}:8041</value></property>
> >
> > <property><name>yarn.resourcemanager.hostname</
> name><value>satish-NS1</value></property>
> >
> > <property><name>yarn.scheduler.maximum-allocation-
> mb</name><value>8192</value></property>
> >
> > <property><name>yarn.nodemanager.remote-app-log-
> dir</name><value>/home/satish/satish/hadoop-yarn/apps</value></property>
> >
> > <property><name>yarn.resourcemanager.resource-
> tracker.client.thread-count</name><value>64</value></property>
> >
> > <property><name>yarn.scheduler.maximum-allocation-
> vcores</name><value>1</value></property>
> >
> > <property><name>yarn.nodemanager.aux-services</
> name><value>mapreduce_shuffle,</value></property>
> >
> > <property><name>yarn.nodemanager.aux-services.
> mapreduce_shuffle.class</name><value>org.apache.hadoop.
> mapred.ShuffleHandler</value></property>
> >
> > <property><name>yarn.resourcemanager.client.thread-
> count</name><value>64</value></property>
> >
> > <property><name>yarn.nodemanager.container-metrics.
> enable</name><value>true</value></property>
> >
> > <property><name>yarn.nodemanager.log-dirs</name><
> value>/home/satish/hadoop-yarn/containers</value></property>
> >  <property> <name>yarn.nodemanager.aux-services</name>
> > <value>spark_shuffle,mapreduce_shuffle</value></property>
> > <property>
> > <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
> > <value>org.apache.hadoop.mapred.ShuffleHandler</value>    </property>
> >  <property><name>yarn.nodemanager.aux-services.
> spark_shuffle.class</name>
> > <value>org.apache.spark.network.yarn.YarnShuffleService</value></
> property>
> >
> > <property><name>yarn.scheduler.minimum-allocation-
> vcores</name><value>1</value></property>
> >  <property><name>yarn.scheduler.increment-allocation-vcores</name>
> > <value>1</value>    </property>
> > <property> <name>yarn.resourcemanager.scheduler.class</name>
> > <value>org.apache.hadoop.yarn.server.resourcemanager.
> scheduler.fair.FairScheduler</value></property>
> > <property><name>yarn.scheduler.fair.preemption</
> name><value>true</value></property>
> >
> > </configuration>
> >
> > Also in capacity scheduler I am using Dominant resource calculator. I
> have
> > tried hands on other fair and default as well.
> >
> > In order make the test simple, I ran sort on the same cluster with
> > yarn-client mode and spark standalone mode. I can share the data for your
> > comparative test analysis as well.
> >
> > 136 seconds - Yarn-client mode
> > 40 seconds  - Spark Standalone mode
> >
> > To conclude I am looking for a reason and solution for yarn-client mode
> > performance issue best configuration possible to achieve performance from
> > yarn.
> >
> > When I use spark.sql.autoBroadcastJoinThreshold::-1 the jobs that takes
> long
> > completes in time and also does not fail often when compared to without
> as I
> > have had history of issues when running job in spark without this option
> > enabled.
> >
> > Let me know how to get similar performance from yarn-client or spark
> > standalone.
> >
> >
> >
> >
> > --
> > View this message in context: http://apache-spark-user-list.
> 1001560.n3.nabble.com/Performance-issue-when-running-Spark-1-6-1-in-yarn-
> client-mode-with-Hadoop-2-6-0-tp28747.html
> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> >
>

Reply via email to