Re: How to insert data for 100 partitions at a time using Spark SQL

Sabarish Sasidharan Sun, 22 May 2016 12:01:37 -0700

Can't you just reduce the amount of data you insert by applying a filter so
that only a small set of idpartitions is selected. You could have multiple
such inserts to cover all idpartitions. Does that help?


Regards
Sab
On 22 May 2016 1:11 pm, "swetha kasireddy" <swethakasire...@gmail.com>
wrote:

> I am looking at ORC. I insert the data using the following query.
>
> sqlContext.sql("  CREATE EXTERNAL TABLE IF NOT EXISTS records (id STRING,
> record STRING) PARTITIONED BY (datePartition STRING, idPartition STRING)
> stored as ORC LOCATION '/user/users' ")
>       sqlContext.sql("  orc.compress= SNAPPY")
>       sqlContext.sql(
>         """ from recordsTemp ps   insert overwrite table users
> partition(datePartition , idPartition )  select ps.id, ps.record ,
> ps.datePartition, ps.idPartition  """.stripMargin)
>
> On Sun, May 22, 2016 at 12:37 AM, Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>> where is your base table and what format is it Parquet, ORC etc)
>>
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 22 May 2016 at 08:34, SRK <swethakasire...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> In my Spark SQL query to insert data, I have around 14,000 partitions of
>>> data which seems to be causing memory issues. How can I insert the data
>>> for
>>> 100 partitions at a time to avoid any memory issues?
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-insert-data-for-100-partitions-at-a-time-using-Spark-SQL-tp26997.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>
>>>
>>
>

Re: How to insert data for 100 partitions at a time using Spark SQL

Reply via email to