Re: Extremely Slow Data Loading with 40k+ Partitions

Daniel Haviv Thu, 16 Apr 2015 11:58:49 -0700

Is this a test environment?
If so, can you try and disable concurrency?


Daniel

> On 16 באפר׳ 2015, at 19:44, Tianqi Tong <[email protected]> wrote:
> 
> Hi Daniel,
> Actually the mapreduce job was just fine, but the process stuck on the data 
> loading after that.
> The output stopped at:
> Loading data to table default.parquet_table_with_40k_partitions partition 
> (yearmonth=null, prefix=null)
>  
> When I look at the size of hdfs files of table, I can see the size is 
> growing, but it's kind of slow.
> For mapreduce job, I had 400+ mappers and 100+ reducers.
>  
> Thanks
> Tianqi
>  
> From: Daniel Haviv [mailto:[email protected]] 
> Sent: Wednesday, April 15, 2015 9:23 PM
> To: [email protected]
> Subject: Re: Extremely Slow Data Loading with 40k+ Partitions
>  
> How many reducers are you using?
> 
> Daniel
> 
> On 16 באפר׳ 2015, at 00:55, Tianqi Tong <[email protected]> wrote:
> 
> Hi,
> I'm loading data to a Parquet table with dynamic partitons. I have 40k+ 
> partitions, and I have skipped the partition stats computation step.
> Somehow it's still exetremely slow loading data into partitions (800MB/h).
> Do you have any hints on the possible reason and solution?
>  
> Thank you
> Tianqi Tong
>

Re: Extremely Slow Data Loading with 40k+ Partitions

Reply via email to