I have seen the code in that while creating input split they are also sending region info with that splits. Is there any reason for that as all the hfiles are not going to be in that server
On Fri, May 26, 2017 at 7:06 PM, Ted Yu <yuzhih...@gmail.com> wrote: > Consider running major compaction which restores data locality. > > Thanks > > > On May 26, 2017, at 6:08 AM, Rajeshkumar J <rajeshkumarit8...@gmail.com> > wrote: > > > > Thanks Ted. If data blocks of the hfile may not be on the same node as > the > > region server then how data locality is achieved when mapreduce is run > over > > hbase tables > > > > > > > >> On Fri, May 26, 2017 at 6:15 PM, Ted Yu <yuzhih...@gmail.com> wrote: > >> > >> The hfiles of a region are stored on hdfs. By default, hdfs has > replication > >> factor of 3. > >> If you're not using read replica feature, any single region is served by > >> one region server (however the data blocks of the hfile may not be on > the > >> same node as the region server). > >> > >> Cheers > >> > >> On Thu, May 25, 2017 at 11:45 PM, Rajeshkumar J < > >> rajeshkumarit8...@gmail.com > >>> wrote: > >> > >>> Hi, > >>> > >>> we have region max file size as 10 GB. Whether the hfiles of a region > >>> exists in same region server or will it be distributed? > >>> > >>> Thanks > >> >