@Davies...good question.. > Just be curious how the difference would be if you use 20 executors > and 20G memory for each executor..
So I tried the following combinations: (GB X # executors) (query response time in secs) 20X20 415 10X40 230 5X80 141 4X100 128 2X200 104 CPU utilization is high so spreading more JVMs onto more vCores helps in this case. For other workloads where memory utilization outweighs CPU, i can see larger JVM sizes maybe more beneficial. It's for sure case-by-case. Seems overhead for codegen and scheduler overhead are negligible. From: Davies Liu <dav...@databricks.com> To: Jesse F Chen/San Francisco/IBM@IBMUS Cc: "Cheng, Hao" <hao.ch...@intel.com>, Todd <bit1...@163.com>, Michael Armbrust <mich...@databricks.com>, "user@spark.apache.org" <user@spark.apache.org> Date: 09/11/2015 10:41 AM Subject: Re: Re:Re:RE: Re:RE: spark 1.5 SQL slows down dramatically by 50%+ compared with spark 1.4.1 SQL On Fri, Sep 11, 2015 at 10:31 AM, Jesse F Chen <jfc...@us.ibm.com> wrote: > > Thanks Hao! > > I tried your suggestion of setting spark.shuffle.reduceLocality.enabled=false and my initial tests showed queries are on par between 1.5 and 1.4.1. > > Results: > > tpcds-query39b-141.out:query time: 129.106478631 sec > tpcds-query39b-150-reduceLocality-false.out:query time: 128.854284296 sec > tpcds-query39b-150.out:query time: 572.443151734 sec > > With default spark.shuffle.reduceLocality.enabled=true, I am seeing across-the-board slow down for majority of the TPCDS queries. > > My test is on a bare metal 20-node cluster. I ran the my test as follows: > > /TestAutomation/spark-1.5/bin/spark-submit --master yarn-client --packages com.databricks:spark-csv_2.10:1.1.0 --name TPCDSSparkSQLHC > --conf spark.shuffle.reduceLocality.enabled=false > --executor-memory 4096m --num-executors 100 > --class org.apache.spark.examples.sql.hive.TPCDSSparkSQLHC > /TestAutomation/databricks/spark-sql-perf-master/target/scala-2.10/tpcdssparksql_2.10-0.9.jar > hdfs://rhel2.cisco.com:8020/user/bigsql/hadoopds100g > /TestAutomation/databricks/spark-sql-perf-master/src/main/queries/jesse/query39b.sql > Just be curious how the difference would be if you use 20 executors and 20G memory for each executor. Share the same JVM for some tasks, could reduce the overhead for codegen and JIT, it may also reduce the overhead of `reduceLocality`(it can be easier to schedule the tasks). > > > > "Cheng, Hao" ---09/11/2015 01:00:28 AM---Can you confirm if the query really run in the cluster mode? Not the local mode. Can you print the c > > From: "Cheng, Hao" <hao.ch...@intel.com> > To: Todd <bit1...@163.com> > Cc: Jesse F Chen/San Francisco/IBM@IBMUS, Michael Armbrust <mich...@databricks.com>, "user@spark.apache.org" <user@spark.apache.org> > Date: 09/11/2015 01:00 AM > Subject: RE: Re:Re:RE: Re:RE: spark 1.5 SQL slows down dramatically by 50%+ compared with spark 1.4.1 SQL > > ________________________________ > > > > Can you confirm if the query really run in the cluster mode? Not the local mode. Can you print the call stack of the executor when the query is running? > > BTW: spark.shuffle.reduceLocality.enabled is the configuration of Spark, not Spark SQL. > > From: Todd [mailto:bit1...@163.com] > Sent: Friday, September 11, 2015 3:39 PM > To: Todd > Cc: Cheng, Hao; Jesse F Chen; Michael Armbrust; user@spark.apache.org > Subject: Re:Re:RE: Re:RE: spark 1.5 SQL slows down dramatically by 50%+ compared with spark 1.4.1 SQL > > I add the following two options: > spark.sql.planner.sortMergeJoin=false > spark.shuffle.reduceLocality.enabled=false > > But it still performs the same as not setting them two. > > One thing is that on the spark ui, when I click the SQL tab, it shows an empty page but the header title 'SQL',there is no table to show queries and execution plan information. > > > > > At 2015-09-11 14:39:06, "Todd" <bit1...@163.com> wrote: > > > Thanks Hao. > Yes,it is still low as SMJ。Let me try the option your suggested, > > > At 2015-09-11 14:34:46, "Cheng, Hao" <hao.ch...@intel.com> wrote: > > You mean the performance is still slow as the SMJ in Spark 1.5? > > Can you set the spark.shuffle.reduceLocality.enabled=false when you start the spark-shell/spark-sql? It’s a new feature in Spark 1.5, and it’s true by default, but we found it probably causes the performance reduce dramatically. > > > From: Todd [mailto:bit1...@163.com] > Sent: Friday, September 11, 2015 2:17 PM > To: Cheng, Hao > Cc: Jesse F Chen; Michael Armbrust; user@spark.apache.org > Subject: Re:RE: spark 1.5 SQL slows down dramatically by 50%+ compared with spark 1.4.1 SQL > > Thanks Hao for the reply. > I turn the merge sort join off, the physical plan is below, but the performance is roughly the same as it on... > > == Physical Plan == > TungstenProject [ss_quantity#10,ss_list_price#12,ss_coupon_amt#19,ss_cdemo_sk#4,ss_item_sk#2,ss_promo_sk#8,ss_sold_date_sk#0] > ShuffledHashJoin [ss_item_sk#2], [ss_item_sk#25], BuildRight > TungstenExchange hashpartitioning(ss_item_sk#2) > ConvertToUnsafe > Scan ParquetRelation [hdfs://ns1/tmp/spark_perf/scaleFactor=30/useDecimal=true/store_sales][ss_promo_sk#8,ss_quantity#10,ss_cdemo_sk#4,ss_list_price#12,ss_coupon_amt#19,ss_item_sk#2,ss_sold_date_sk#0] > TungstenExchange hashpartitioning(ss_item_sk#25) > ConvertToUnsafe > Scan ParquetRelation [hdfs://ns1/tmp/spark_perf/scaleFactor=30/useDecimal=true/store_sales][ss_item_sk#25] > > Code Generation: true > > > > At 2015-09-11 13:48:23, "Cheng, Hao" <hao.ch...@intel.com> wrote: > > This is not a big surprise the SMJ is slower than the HashJoin, as we do not fully utilize the sorting yet, more details can be found at https://issues.apache.org/jira/browse/SPARK-2926 . > > Anyway, can you disable the sort merge join by “spark.sql.planner.sortMergeJoin=false;” in Spark 1.5, and run the query again? In our previous testing, it’s about 20% slower for sort merge join. I am not sure if there anything else slow down the performance. > > Hao > > > From: Jesse F Chen [mailto:jfc...@us.ibm.com] > Sent: Friday, September 11, 2015 1:18 PM > To: Michael Armbrust > Cc: Todd; user@spark.apache.org > Subject: Re: spark 1.5 SQL slows down dramatically by 50%+ compared with spark 1.4.1 SQL > > > Could this be a build issue (i.e., sbt package)? > > If I ran the same jar build for 1.4.1 in 1.5, I am seeing large regression too in queries (all other things identical)... > > I am curious, to build 1.5 (when it isn't released yet), what do I need to do with the build.sbt file? > > any special parameters i should be using to make sure I load the latest hive dependencies? > > Michael Armbrust ---09/10/2015 11:07:28 AM---I've been running TPC-DS SF=1500 daily on Spark 1.4.1 and Spark 1.5 on S3, so this is surprising. I > > From: Michael Armbrust <mich...@databricks.com> > To: Todd <bit1...@163.com> > Cc: "user@spark.apache.org" <user@spark.apache.org> > Date: 09/10/2015 11:07 AM > Subject: Re: spark 1.5 SQL slows down dramatically by 50%+ compared with spark 1.4.1 SQL > > ________________________________ > > > > > I've been running TPC-DS SF=1500 daily on Spark 1.4.1 and Spark 1.5 on S3, so this is surprising. In my experiments Spark 1.5 is either the same or faster than 1.4 with only small exceptions. A few thoughts, > > - 600 partitions is probably way too many for 6G of data. > - Providing the output of explain for both runs would be helpful whenever reporting performance changes. > > On Thu, Sep 10, 2015 at 1:24 AM, Todd <bit1...@163.com> wrote: > > Hi, > > I am using data generated with sparksqlperf( https://github.com/databricks/spark-sql-perf) to test the spark sql performance (spark on yarn, with 10 nodes) with the following code (The table store_sales is about 90 million records, 6G in size) > > val outputDir="hdfs://tmp/spark_perf/scaleFactor=30/useDecimal=true/store_sales" > val name="store_sales" > sqlContext.sql( > s""" > |CREATE TEMPORARY TABLE ${name} > |USING org.apache.spark.sql.parquet > |OPTIONS ( > | path '${outputDir}' > |) > """.stripMargin) > > val sql=""" > |select > | t1.ss_quantity, > | t1.ss_list_price, > | t1.ss_coupon_amt, > | t1.ss_cdemo_sk, > | t1.ss_item_sk, > | t1.ss_promo_sk, > | t1.ss_sold_date_sk > |from store_sales t1 join store_sales t2 on t1.ss_item_sk = t2.ss_item_sk > |where > | t1.ss_sold_date_sk between 2450815 and 2451179 > """.stripMargin > > val df = sqlContext.sql(sql) > df.rdd.foreach(row=>Unit) > > With 1.4.1, I can finish the query in 6 minutes, but I need 10+ minutes with 1.5. > > The configuration are basically the same, since I copy the configuration from 1.4.1 to 1.5: > > sparkVersion 1.4.1 1.5.0 > scaleFactor 30 30 > spark.sql.shuffle.partitions 600 600 > spark.sql.sources.partitionDiscovery.enabled true true > spark.default.parallelism 200 200 > spark.driver.memory 4G 4G 4G > spark.executor.memory 4G 4G > spark.executor.instances 10 10 > spark.shuffle.consolidateFiles true true > spark.storage.memoryFraction 0.4 0.4 > spark.executor.cores 3 3 > > I am not sure where is going wrong,any ideas? > > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org