Is this a test environment? If so, can you try and disable concurrency?
Daniel > On 16 באפר׳ 2015, at 19:44, Tianqi Tong <[email protected]> wrote: > > Hi Daniel, > Actually the mapreduce job was just fine, but the process stuck on the data > loading after that. > The output stopped at: > Loading data to table default.parquet_table_with_40k_partitions partition > (yearmonth=null, prefix=null) > > When I look at the size of hdfs files of table, I can see the size is > growing, but it's kind of slow. > For mapreduce job, I had 400+ mappers and 100+ reducers. > > Thanks > Tianqi > > From: Daniel Haviv [mailto:[email protected]] > Sent: Wednesday, April 15, 2015 9:23 PM > To: [email protected] > Subject: Re: Extremely Slow Data Loading with 40k+ Partitions > > How many reducers are you using? > > Daniel > > On 16 באפר׳ 2015, at 00:55, Tianqi Tong <[email protected]> wrote: > > Hi, > I'm loading data to a Parquet table with dynamic partitons. I have 40k+ > partitions, and I have skipped the partition stats computation step. > Somehow it's still exetremely slow loading data into partitions (800MB/h). > Do you have any hints on the possible reason and solution? > > Thank you > Tianqi Tong >
