yes, its one region = one reducer = one HFile generated -- Laurent HATIER - Consultant Big Data & Business Intelligence chez CapGemini fr.linkedin.com/pub/laurent-hatier/25/36b/a86/ <http://fr.linkedin.com/pub/laurent-h/25/36b/a86/>
2015-07-30 17:07 GMT+02:00 Krishna <[email protected]>: > There are 10 region servers & I can schedule compaction during weekend when > the write load negligable. > > After reading the documentation, its not clear how many HFiles are created > once bulk-load finishes - is it one HFile per reducer? My question is, is > it recommended to run major compaction after bulk-load if the # of regions > on each region server are not too high? > > > On Thursday, July 30, 2015, Ted Yu <[email protected]> wrote: > > > How many region servers do you have in the cluster ? > > > > Would there be concurrent write load on the cluster if you choose to run > > major > > compaction ? I ask this because the concurrent write would be slowed down > > by the major compaction and compacting 10 TB of data would take some > time. > > > > Cheers > > > > On Wed, Jul 29, 2015 at 4:23 PM, Krishna <[email protected] > > <javascript:;>> wrote: > > > > > Hi, > > > > > > I am planning to bulk-load about 10 TB of data to a table pre-split > with > > > 30 regions with max region file size configured to 10 GB. > > > > > > Is it recommended that I run a major compaction when bulk-loading > > > finishes? How > > > many HFiles does the reducer create? > > > > > >
