[ https://issues.apache.org/jira/browse/SPARK-21650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16116187#comment-16116187 ]
Madhavi Vaddepalli commented on SPARK-21650: -------------------------------------------- Thank you Sean Owen. -Madhavi. > Insert into hive partitioned table from spark-sql taking hours to complete > -------------------------------------------------------------------------- > > Key: SPARK-21650 > URL: https://issues.apache.org/jira/browse/SPARK-21650 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.6.0 > Environment: Linux machines > Spark version - 1.6.0 > Hive Version - 1.1 > 200- number of executors. > 3 - number of executor cores. > 10g - executor and driver memory. > dynamic allocation enabled. > Reporter: Madhavi Vaddepalli > > We are trying to execute some logic using spark sql: > Input to program : 7 billion records. (60 gb gzip compressed,text format) > Output : 7 billion records.(260 gb gzip compressed and partitioned on few > columns) > output has 10000 partitions(it has 10000 different combinations > of partition columns) > We are trying to insert this output to a hive table. (text format , gzip > compressed) > All the tasks spawned finished completely in 33 minutes and all the executors > are de-commissioned, only driver is active.*It remained in this state without > showing any active stage or task in spark UI for about 2.5 hrs. *and > completed successfully. > Please let us know what can be done to improve the performance here.(is it > fixed in later versions ?) -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org