Can't you just reduce the amount of data you insert by applying a filter so that only a small set of idpartitions is selected. You could have multiple such inserts to cover all idpartitions. Does that help?
Regards Sab On 22 May 2016 1:11 pm, "swetha kasireddy" <swethakasire...@gmail.com> wrote: > I am looking at ORC. I insert the data using the following query. > > sqlContext.sql(" CREATE EXTERNAL TABLE IF NOT EXISTS records (id STRING, > record STRING) PARTITIONED BY (datePartition STRING, idPartition STRING) > stored as ORC LOCATION '/user/users' ") > sqlContext.sql(" orc.compress= SNAPPY") > sqlContext.sql( > """ from recordsTemp ps insert overwrite table users > partition(datePartition , idPartition ) select ps.id, ps.record , > ps.datePartition, ps.idPartition """.stripMargin) > > On Sun, May 22, 2016 at 12:37 AM, Mich Talebzadeh < > mich.talebza...@gmail.com> wrote: > >> where is your base table and what format is it Parquet, ORC etc) >> >> >> >> Dr Mich Talebzadeh >> >> >> >> LinkedIn * >> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >> >> >> >> http://talebzadehmich.wordpress.com >> >> >> >> On 22 May 2016 at 08:34, SRK <swethakasire...@gmail.com> wrote: >> >>> Hi, >>> >>> In my Spark SQL query to insert data, I have around 14,000 partitions of >>> data which seems to be causing memory issues. How can I insert the data >>> for >>> 100 partitions at a time to avoid any memory issues? >>> >>> >>> >>> -- >>> View this message in context: >>> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-insert-data-for-100-partitions-at-a-time-using-Spark-SQL-tp26997.html >>> Sent from the Apache Spark User List mailing list archive at Nabble.com. >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>> For additional commands, e-mail: user-h...@spark.apache.org >>> >>> >> >