Re: HBase mapreduce job crawls on final 25% of maps

2016-04-13 Thread Colin Kincaid Williams
It appears that my issue was caused by the missing sections I mentioned in the second post. I ran a job with these settings, and my job finished in < 6 hours. Thanks for your suggestions because I have further ideas regarding issues moving forward. scan.setCaching(500);// 1 is the default

Re: HBase mapreduce job crawls on final 25% of maps

2016-04-13 Thread Colin Kincaid Williams
Hi Chien, 4. From 50-150k per * second * to 100-150k per * minute *, as stated above, so reads went *DOWN* significantly. I think you must have misread. I will take into account some of your other suggestions. Thanks, Colin On Tue, Apr 12, 2016 at 8:19 PM, Chien Le wrote: > Some things I wou

Re: HBase mapreduce job crawls on final 25% of maps

2016-04-12 Thread Chien Le
Some things I would look at: 1. Node statistics, both the mapper and regionserver nodes. Make sure they're on fully healthy nodes (no disk issues, no half duplex, etc) and that they're not already saturated from other jobs. 2. Is there a common regionserver behind the remaining mappers/regions? If

Re: HBase mapreduce job crawls on final 25% of maps

2016-04-12 Thread Colin Kincaid Williams
I've noticed that I've omitted scan.setCaching(500);// 1 is the default in Scan, which will be bad for MapReduce jobs scan.setCacheBlocks(false); // don't set to true for MR jobs which appear to be suggestions from examples. Still I am not sure if this explains the significant request sl

Re: HBase mapreduce job crawls on final 25% of maps

2016-04-12 Thread Colin Kincaid Williams
Excuse my double post. I thought I deleted my draft, and then constructed a cleaner, more detailed, more readable mail. On Tue, Apr 12, 2016 at 10:26 PM, Colin Kincaid Williams wrote: > After trying to get help with distcp on hadoop-user and cdh-user > mailing lists, I've given up on trying to us