Re: Spark work distribution among execs

2016-03-15 Thread bkapukaranov
that the cluster is made of identical nodes in terms of HW so its not like one of the nodes just "works" quicker. Thanks, Borislav -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-work-distribution-among-execs-tp26502p26508.html Sent from the Apache

Re: Spark work distribution among execs

2016-03-15 Thread manasdebashiskar
to be less. To prove the same, use a even bigger input for your job. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-work-distribution-among-execs-tp26502p26506.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: Spark work distribution among execs

2016-03-15 Thread bkapukaranov
ng> -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-work-distribution-among-execs-tp26502p26504.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To u

Re: Spark work distribution among execs

2016-03-15 Thread Chitturi Padma
- > If you reply to this email, your message will be added to the discussion > below: > > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-work-distribution-among-execs-tp26502.html > To start a new topic under Apache Spark User List, email > ml-node+s100156

Spark work distribution among execs

2016-03-15 Thread bkapukaranov
the cause for this behaviour? 2. Any ideas how to achieve a more balanced performance? Thanks, Borislav -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-work-distribution-among-execs-tp26502.html Sent from the Apache Spark User List mailing li

Spark work distribution among execs

2016-03-15 Thread Borislav Kapukaranov
Hi, I'm running a Spark 1.6.0 on YARN on a Hadoop 2.6.0 cluster. I observe a very strange issue. I run a simple job that reads about 1TB of json logs from a remote HDFS cluster and converts them to parquet, then saves them to the local HDFS of the Hadoop cluster. I run it with 25 executors with