I think it doesn't matter about number of region in your RS IF your key is good one ! Maybe, check some documentation about number of HFile in each HRegion (there is some stuff about HFile and minor compaction) and this property can affect your write/read speed.
-- Laurent HATIER - Consultant Big Data & Business Intelligence chez CapGemini fr.linkedin.com/pub/laurent-hatier/25/36b/a86/ <http://fr.linkedin.com/pub/laurent-h/25/36b/a86/> 2015-07-30 17:14 GMT+02:00 Laurent H <[email protected]>: > yes, its one region = one reducer = one HFile generated > > -- > Laurent HATIER - Consultant Big Data & Business Intelligence chez CapGemini > fr.linkedin.com/pub/laurent-hatier/25/36b/a86/ > <http://fr.linkedin.com/pub/laurent-h/25/36b/a86/> > > 2015-07-30 17:07 GMT+02:00 Krishna <[email protected]>: > >> There are 10 region servers & I can schedule compaction during weekend >> when >> the write load negligable. >> >> After reading the documentation, its not clear how many HFiles are created >> once bulk-load finishes - is it one HFile per reducer? My question is, is >> it recommended to run major compaction after bulk-load if the # of regions >> on each region server are not too high? >> >> >> On Thursday, July 30, 2015, Ted Yu <[email protected]> wrote: >> >> > How many region servers do you have in the cluster ? >> > >> > Would there be concurrent write load on the cluster if you choose to run >> > major >> > compaction ? I ask this because the concurrent write would be slowed >> down >> > by the major compaction and compacting 10 TB of data would take some >> time. >> > >> > Cheers >> > >> > On Wed, Jul 29, 2015 at 4:23 PM, Krishna <[email protected] >> > <javascript:;>> wrote: >> > >> > > Hi, >> > > >> > > I am planning to bulk-load about 10 TB of data to a table pre-split >> with >> > > 30 regions with max region file size configured to 10 GB. >> > > >> > > Is it recommended that I run a major compaction when bulk-loading >> > > finishes? How >> > > many HFiles does the reducer create? >> > > >> > >> > >
